On the afternoon of April 2, Ou Yangjian, general manager of Baidu intelligent chip, shared Kunlun chip in detail for the first time in an open class, and disclosed a number of comparative data between Kunlun K200 and Intel T4 GPU, among which the most advantageous data is that GEMM int8's benchmark is three times of T4's performance. Ouyang Jian also showed the killer mace of Kunlun chip through video, which is well adapted to the domestic processor Feiteng.
At the Baidu AI Developers Conference in 2018, Baidu founder, chairman and CEO Robin Li announced the launch of the self research AI chip Kunlun. The accumulation of Baidu's AI chip research and development benefits from the accumulation of using FPGA as AI acceleration, as well as its years of accumulation in software definition accelerator and XPU architecture.
Baidu first began to use FPGA to do AI architecture research and development in 2010, launched small-scale deployment online in 2011, deployed more than 10000 pieces in 2017 FPGA,2018 released its own R & D chip in the second half of 2019, successfully, starting mass production in 2020.
The positioning of Kunlun chip is a universal AI chip, and the goal is to provide high performance, low cost and high flexibility AI chip. Ouyang Jian said in the sharing:
After the release of Kunlun, relevant information was released one after another. In terms of architecture, Kunlun has two computing units, 512gb / s memory bandwidth and 16MB SRAM / unit. Ouyang Jian introduced that 16MB SRAM is very helpful for AI reasoning. Xpu-sdnn on XPU architecture is designed for tensor, etc., while XPU cluster can meet the needs of general processing.
The first generation of Kunlun chips did not use nvlink, but connected through PCIe 4.0 interface. With the support of Samsung's 14nm manufacturing process and 2.5D package, the peak performance of Kunlun chip can reach 260tops and the power consumption is 150W.
With regard to flexibility and ease of use, Kunlun provides developers with software stacks similar to the Nvidah CUDA, which can be programmed through C/C languages to reduce the development difficulty of developers.
At present, based on the first generation of Kunlun chips, Baidu has launched two AI acceleration cards, k100 and K200, the former with twice the computing power and power consumption.
in today's sharing, ouyangjian gives a series of data K200 contrast nvidia T4, in which under the matrix of Gemm-Int8 data type ,4 K X 4K, the Benchmark division of kunlun K200 is more than 2000, more than 3 times that of nvidia's T4.
kunlun also has obvious performance advantages under the Bert/Ernie test model commonly used in speech.
In terms of performance data on-line, Kunlun's performance is more stable than NVIDIA T4, and its delay has advantages.
Although Kunlun has advantages in image segmentation YOLOV3 algorithm, the advantages are not so obvious. But Ouyang said Baidu is still improving Kunlun's performance through continuous optimization.
He also said kunlun has been in baidu internal scale application. As for the external supply AI computing power, last December 13, Baidu through the way of targeted invitation through Baidu cloud to provide Kunlun computing power. During the live interaction with Ouyang Jian, Lei Feng net (public number: Lei Feng net) learned that providing Kunlun AI power through Baidu cloud is still the way of directional invitation, and mainly the way of private deployment. Baidu will through the targeted invitation of the customer feedback message, and then through Baidu cloud large-scale to provide Kunlun's computing power, but he did not give a specific time line.
In addition to providing Kunlun's computational power through Baidu Cloud, Ouyang Jian also shows the application of Kunlun acceleration card in industrial intelligent devices. Ouyang Jian demonstrated the use of CPU and Kunlun acceleration card to carry out product defect detection, Kunlun can greatly improve the speed, but did not give specific contrast data.
Another display is kunlun's killer mace, that is and domestic processor platform flying fit. At the 2019 Feiteng Ecological Partners Conference, Ouyang Jian revealed that Kunlun AI chip is adapting to domestic Feiteng server to do performance tuning. In today's online sharing, Ouyang Jian shows the significant acceleration of image segmentation speed brought by using Kunlun acceleration card.
As a representative of domestic chips, Kunlun chose to fit with Feiteng very well, which is obviously a big market of domestic self-developed chips.
Through the way of Feiteng CPU Kunlun AI accelerator, both sides can better realize the localization of domestic chips in the server market, and can also be regarded as an important driving force and killer mace for the future growth of Kunlun AI chips and accelerated cards.