Home > News content

The road to Microsoft's AI chip has actually gone for 78 years.

via:博客园     time:2018/6/13 22:31:49     readed:447

Microsoft is the reason why Intel bought Altera last summer.

In September 2016, faced with an interview with Wired magazine, Diane Bryant, who was then Intel’s executive vice president (she had left Intel at the end of 2017 and served as Google Cloud COO) explained that the semiconductor giant used 167 in 2015. Billion-dollar sky-high price acquisition of Altera, the second largest FPGA vendor in the world. Obviously, Microsoft is not the only reason for the above acquisition, but it is undoubtedly one of the most important reasons.


This matter, but also from 2010 Microsoft's Project Catapult talking about.

The origin of Microsoft and FPGA

In 2010, Microsoft was still in the company's second CEO, Steve Ballmer. At that time, Lu Qi's role was as Microsoft's president of the online business unit. He was also the head of the Bing project. Under the prevailing conditions, Bing Search is one of the few online businesses within Microsoft, and it is catching up with the powerful Google search engine —— whether it is in search results or responsiveness; the latter is a measure of a search. The core indicators of the technical capabilities behind the engine.

The origin of Microsoft and FPGA began under this condition.

At that time, Microsoft's search engine was an online service that relied on thousands of machines. Each machine needed to be powered by a CPU. Although companies such as Intel continued to improve the CPU, these chips still couldn't keep up. In other words, services such as Bing Search already exceed Moore's Law's predicted processor capabilities ——facts also prove that adding CPUs does not solve the problem.

However, if a dedicated chip is manufactured for emerging demand, the cost is very high. And just as FPGAs can make up for this shortfall, Bing decided to let engineers create faster, less power-hungry, general-purpose CPUs and simultaneously customizable chips to solve the challenges of changing technology and business models.

In December 2010, 39-year-old Microsoft researcher Andrew Putnam spent two days before Christmas and spent about five hours completing a hardware design that could run Bing machine learning algorithms on an FPGA. Lei Fengwang (public number: Lei Feng network) learned that Andrew Putnam worked for 5 years at the University of Washington as a researcher and mainly engaged in FPGA research. He was invited to Microsoft by a Microsoft researcher Doug Burger who was engaged in computer chip research in 2009. , And Doug Burger later became Andrew Putnam's boss at Microsoft.


Andrew Putnam's hardware design was done under the guidance of Doug Burger's —— it was the later Project Catapult, although there is no such name yet.

Later, based on this hardware design, the Burger team successfully built the model and proved that it can speed up Bing's machine learning algorithm by 100 times. Finally, the prototype attracted Luke and in December 2012 it was the face of Project Cataplut. It appeared before Microsoft CEO Steve Ballmer.

Since then, Microsoft has given sufficient funds to allow Burger to configure the FPGA on 1600 servers for testing. With the help of hardware manufacturers in China and Taiwan, the team spent six months creating hardware products and testing them on a set of racks in the Microsoft data center. In the months from 2013 to 2014, the tests showed that the Bing "decision tree" machine learning algorithm can increase the speed by 40 times with the help of the new chip.

In the summer of 2014, Microsoft stated that it would soon apply the hardware to Bing's real-time data center.

From Bing to Azure

However, FPGA's outstanding contribution to speeding up data processing is not only valued by Bing's business units, it has also entered the vision of other Microsoft online businesses, one is Azure cloud computing business, and the other is Office 365. Of course, in terms of the business’s contribution to Microsoft’s overall revenue, Azure is clearly more effective.

Thus, starting from Bing, the idea of ​​using FPGAs to accelerate the driving of Azure's data center was finally recognized by Microsoft. However, according to Mark Russinovich, Azure's chief architect, Project Cataplut has the potential to solve problems, but in a different way than Bing itself. His team needed to configure a programmable chip on each server and then connect each server to the main network so that they could start processing before the data traffic reached the server.


In 2014, 23 co-authors from Microsoft, Amazon, Columbia University, Google, and other organizations published an article entitled "A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services" by the IEEE. Long papers, which received technical support from Altera and Quanta FPGA companies; among them, Andrew Putnam and Doug Burger mentioned above are all authors of the papers, including Lu Qi and Shen Xiangyang. A Microsoft executive also supported the paper.

In this paper, Project Catapult from Microsoft Research was highlighted. The article stated that in order to make the data center have the capabilities that traditional servers do not have, Microsoft has introduced a new, configurable and configurable architecture with 48 embedded Statix ​​V FPGAs per instance (Statix). It is the Altera brand's chip that each FPGA chip is embedded in a server, connected through the PCIe interface, and then directly connected with other FPGAs through the 10Gb SAS cable.

Lei Fengwang noted that the actual case mentioned in this paper is still the Bing Web search engine, but it has already demonstrated the possibility of using FPGAs in large data centers.

Finally, in Project Cataplut's third-generation prototype, the FGPA chip sits at the edge of each server and can be plugged directly into the network, but it still retains any FPGA pool that the machine can access, increasing its scalability. For this reason, FPGA researchers need to redesign the hardware. The end result is that the cost of Project Catapult hardware only accounts for 30% of the total cost of all other accessories in the server, and the required operating energy is less than 10%, but It has brought 2 times the original processing speed.


So, Azure embraces FPGAs, and so does Office 365; Doug Burger says they will drive all Microsoft services.

Project Brainwave

In 2016, AlphaGo vs. Li Shishi made the word AI a label for a new stage of technology development and even a label for the new era. As a result, under the enthusiasm of artificial intelligence, the concept of AI chips also followed, but compared to the exploration and efforts made by giants such as Microsoft and Google in related fields, the concept itself is not enough to cover.

However, based on Project Catapult, Microsoft is going further and further along the way.

At the 2017 HotChips conference, Microsoft demonstrated Project Brainwave, an FPGA-based low latency deep learning cloud platform. According to Microsoft's official evaluation, when using Intel's Stratix 10 FPGAs (when Altera was already acquired by Intel), Brainwave can achieve 39.5 Teraflops of performance in a large GRU (gated recurrent unit) without any batching.


Project Brainwave is divided into three layers. The first is a high-performance, distributed FPGA system architecture; adding FPGAs directly into the data center network can be used as a hardware microservice. In fact, the DNN processing unit is integrated into the FPGA. Finally, Project Brainwave also supports software stacks for popular deep learning frameworks such as the Microsoft Cognitive Toolkit and Google Tensorflow.

Microsoft stated:

The system is designed for real-time AI —— this means that it can handle requests immediately after receiving data with very low latency. Real-time AI is becoming more and more important as the cloud infrastructure needs to handle real-time data streams, whether for search requests, video, sensor data streams, or user interactions.

As you can see, Project Brainwave is better than Project Cataplut in many aspects, including the speed of operation, the degree of hardware/software integration, and the fit of AI. More importantly, Project Brainwave took into account at the very beginning of design, not only to help Bing run complex computations such as deep learning, but also to open up to external developers via Azure.

At the Bulid conference in May 2018, Microsoft announced Project Brainwave open preview. This architecture for deep neural network processing can be used for Azure and edge environments. According to Microsoft, Project Brainwave enables Azure to become the fastest cloud platform to run artificial intelligence in real time and achieve full integration with Azure machine learning; it also supports Intel's FPGA hardware and ResNet50-based neural network. In addition, the development of Project Brainwave for Azure Stack and Azure Data Box is also in progress.

However, even Microsoft himself admits that Project Brainwave is essentially built using the results of Project Cataplut. This is a continuous project that only plays different roles in different stages of development.


However, from Project Cataplut to Project Brainwave, it has been seven or eight years before or after.

Lei Feng network has something to say

If we use the AI ​​chip to define Microsoft's work on Project Brainwave or even Project Cataplut, it can be said to be completely flawless; of course, Microsoft has done more in the so-called AI chip area, for example, also for Hololens. The HPU, however, is already in the category of endian AI.

Only when Microsoft's work related to AI chips has become an established fact, there is no need for it to make some noise in the industry's relevant developments. That's exactly what we can say in one sentence: The Microsoft Azure Division is still recruiting chip talent —— the results have been forced into big news.

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments