In the afternoon of April 2, Ou Yangjian, general manager of Baidu intelligent chip, shared Kunlun chip in detail for the first time in an open class, and disclosed a number of comparative data between Kunlun K200 and Intel T4 GPU, among which the most advantageous data is that GEMM int8's benchmark is three times of T4's performance. Ouyang Jian also showed the killer mace of Kunlun chip through video, which is well adapted to the domestic processor Feiteng.
At the Baidu AI Developers Conference in 2018, Baidu founder, chairman and CEO Robin Li announced the launch of the self research AI chip Kunlun. The accumulation of Baidu's AI chip research and development benefits from the accumulation of using FPGA as AI acceleration, as well as its years of accumulation in software definition accelerator and XPU architecture.
Baidu started the research and development of AI architecture with FPGA as early as 2010, launched small-scale deployment in 2011, deployed more than 10000 FPGAs in 2017, released self-developed AI chips in 2018, successfully streamed in the second half of 2019, and started mass production in 2020.
After the release of Kunlun, relevant information was released one after another. In terms of architecture, Kunlun has two computing units, 512gb / s memory bandwidth and 16MB SRAM / unit. Ouyang Jian introduced that 16MB SRAM is very helpful for AI reasoning. Xpu-sdnn on XPU architecture is designed for tensor, etc., while XPU cluster can meet the needs of general processing.
The first generation of Kunlun chips did not use nvlink, but connected through PCIe 4.0 interface. With the support of Samsung's 14nm manufacturing process and 2.5D package, the peak performance of Kunlun chip can reach 260tops and the power consumption is 150W.
At present, based on the first generation of Kunlun chips, Baidu has launched two AI acceleration cards, k100 and K200, the former with twice the computing power and power consumption.
In today's sharing, Ouyang Jian gives a series of K200 data compared with NVIDIA T4. Under the GEMM int8 data type, 4K x 4K matrix, the benchmark of Kunlun K200 is more than 2000, more than three times that of NVIDIA T4.
Under the Bert / Ernie test model commonly used in speech, Kunlun has obvious performance advantages.
In terms of performance data on-line, Kunlun's performance is more stable than NVIDIA T4, and its delay has advantages.
In the image segmentation algorithm yolov3, although Kunlun has advantages, the advantages are not so obvious. But Ouyang Jian said Baidu is still improving Kunlun's performance through continuous optimization.
At the same time, he said, Kunlun has been applied in the internal scale of Baidu. As for providing external AI computing power, on December 13 last year, baidu provided Kunlun computing power through Baidu cloud through directional invitation. In the live interaction with Ouyang Jian, Lei Feng learned that providing Kunlun AI computing power through Baidu cloud is still a way of directional invitation, and mainly a way of private deployment. Baidu will provide Kunlun's computing power on a large scale through Baidu cloud based on the feedback from the invited customers, but he did not give a specific timeline.
In addition to providing Kunlun computing power through Baidu cloud, Ouyang Jian also demonstrated the application of Kunlun acceleration card in industrial intelligent devices. Ouyang Jian demonstrated using CPU and Kunlun acceleration card to detect product defects. Kunlun can greatly improve the speed, but no specific comparison data was given.
Another display is Kunlun's trump card, which is the adaptation to the domestic processor platform Feiteng. At the 2019 Feiteng ecological Partner Conference, Ouyang Jian revealed that Kunlun AI chip is adapting to the domestic Feiteng server to do performance optimization work. In today's online sharing, Ouyang Jian shows the remarkable acceleration of image segmentation speed brought by the Kunlun acceleration card.
As a representative of domestic chips, Kunlun chose to fit with Feiteng very well, which is obviously a big market of domestic self-developed chips.