At the Baidu AI Developers Conference in 2018, Baidu founder, chairman and CEO Robin Li announced the launch of the self research AI chip Kunlun. The accumulation of Baidu's AI chip research and development benefits from the accumulation of using FPGA as AI acceleration, as well as its years of accumulation in software definition accelerator and XPU architecture.
Baidu started the research and development of AI architecture with FPGA as early as 2010, launched small-scale deployment in 2011, deployed more than 10000 FPGAs in 2017, released self-developed AI chips in 2018, successfully streamed in the second half of 2019, and started mass production in 2020.
Kunlun chip is a general AI chip. Its goal is to provide AI chips with high performance, low cost and high flexibility. "Compared with GPU, Kunlun has done a good job in universality and programmability, and we are still trying to do better in programmability," Ouyang Jian said in sharing
After the release of Kunlun, relevant information was released one after another. In terms of architecture, Kunlun has two computing units, 512gb / s memory bandwidth and 16MB SRAM / unit. Ouyang Jian introduced that 16MB SRAM is very helpful for AI reasoning. Xpu-sdnn on XPU architecture is designed for tensor, etc., while XPU cluster can meet the needs of general processing.
With regard to flexibility and ease of use, Kunlun provides developers with software stacks similar to the Nvidah CUDA, which can be programmed through C/C languages to reduce the development difficulty of developers.
At present, based on the first generation of Kunlun chips, Baidu has launched two AI acceleration cards, k100 and K200, the former with twice the computing power and power consumption.
Under the Bert / Ernie test model commonly used in speech, Kunlun has obvious performance advantages.
In terms of performance data on-line, Kunlun's performance is more stable than NVIDIA T4, and its delay has advantages.
In the image segmentation algorithm yolov3, although Kunlun has advantages, the advantages are not so obvious. But Ouyang Jian said Baidu is still improving Kunlun's performance through continuous optimization.
At the same time, he said, Kunlun has been applied in the internal scale of Baidu. As for providing external AI computing power, on December 13 last year, baidu provided Kunlun computing power through Baidu cloud through directional invitation. At present, providing Kunlun AI computing power through Baidu cloud is still a way of directional invitation, and mainly a way of private deployment. Baidu will provide Kunlun's computing power on a large scale through Baidu cloud after receiving the feedback from the invited customers, but he did not give a specific timeline.
In addition to providing Kunlun computing power through Baidu cloud, Ouyang Jian also demonstrated the application of Kunlun acceleration card in industrial intelligent devices. Ouyang Jian demonstrated using CPU and Kunlun acceleration card to detect product defects. Kunlun can greatly improve the speed, but no specific comparison data was given.
Another display is Kunlun's trump card, which is the adaptation to the domestic processor platform Feiteng. At the 2019 Feiteng ecological Partner Conference, Ouyang Jian revealed that Kunlun AI chip is adapting to the domestic Feiteng server to do performance optimization work. In today's online sharing, Ouyang Jian shows the remarkable acceleration of image segmentation speed brought by the Kunlun acceleration card.
As a representative of domestic chips, Kunlun chose to fit with Feiteng very well, which is obviously a big market of domestic self-developed chips.
Through the way of Feiteng CPU Kunlun AI accelerator, both sides can better realize the localization of domestic chips in the server market, and can also be regarded as an important driving force and killer mace for the future growth of Kunlun AI chips and accelerated cards.