China's first heterogeneous computing processor IP core in the successful implementation of silicon

Recently, China and China core company announced that its heterogeneous computing processor IP core has been successfully implemented on a silicon wafer, and has passed the HSA (Heterogeneous System Architecture) conformance testing. The company also announced a new open source project and deep machine learning neural network to further promote the development of HSA heterogeneous computing. The launch of the product is the core of Chinese domestic enterprises technological breakthrough in heterogeneous computing, and is expected in artificial intelligence, machine vision, industrial 4.0, mobile communications, unmanned aerial vehicles and other areas of application.

What is heterogeneous computing

Isomorphic calculated using the same type of calculation instruction set and architecture of computing units system. The heterogeneous computing refers primarily calculated using different types of instruction set computing architecture and system units, common calculation unit categories include CPU, GPU, DSP, ASIC, FPGA and the like. Heterogeneous computing with a simple formula can be expressed as & ldquo; CPU + XXX & rdquo ;, in essence concerned Cpu + essentially refers to the promotion of single and multi-core capabilities. For example, APU AMD focus on the development of belonging to heterogeneous computing, with the formula is CPU + GPU.

Since the industry specializing in surgery, CPU, GPU, DSP, ASIC, FPGA strengths, in some scenarios, introducing a specific calculation unit so that the computing system into a hybrid structure, can make CPU, GPU, DSP, FPGA implementation of their best at the task. If you can do that will run seamlessly on a general purpose computing on the CPU, operating in parallel computing on the GPU that runs on top of optimized DSP or ASIC / FPGA above calculations together, you can get better application performance, lower power consumption and other characteristics, relative to the same calculations, the structure also may have some performance advantages.

In the ultra-count has been widely used

In fact, heterogeneous computing is no stranger to everyone, light brush TOP500 list of the Milky Way No. 2 and divinity of Taihu Lake have adopted heterogeneous computing. The reason is that heterogeneous computing can bring high performance and performance per watt, Tianhe compute node No. 2 as an example.

Xeon E5 full-power up to 145W, double precision floating point is 0.21T Flops, and Xeon PHI power 300W, double precision floating point of 1T Flops.

Tianhe compute node consists of a 2 two Xeon E5 and three Xeon PHI, double precision floating point performance theory 3.42T Flops, power consumption of 1190W, the ratio of double precision floating point performance theory and power is 2.87GFlops / W. Use eight Xeon E5 at the same power can only get 1696Gflops double precision floating point performance theory, the theory of double precision floating point performance ratio and power consumption is 1.42GFlops / W. As can be seen from the data, in the same power consumption, the use of Xeon accelerated after PHI, double precision floating point performance theory and the use of only two times Xeon E5.

It is because, in theory, heterogeneous computing has many advantages, some of the media will be & ldquo; CPU + XXX & rdquo; called next-generation processors.

Instruction set developed

HSA Association is to promote the popularity of the various aspects of heterogeneous computing, and China is one of the core part, and jointly promote heterogeneous computing to more and more areas popularity.

Perhaps many people have never heard of such a core Chinese company, but the company is really in China's first heterogeneous computing processor IP core design of integrated circuits companies & mdash; & mdash; China is a core company in the global market positioning Chinese company, the source of R & D team is global, but from the technical to personnel are vested in the Chinese core hands, such as the Chinese core 3-in-1 & ldquo; Unity & rdquo; architecture is its unique technologies in the industry is also a leader.

Even more unusual is that the Chinese did not like some of the core business as companies buy ARM instruction set, but the company developed a set of instructions, microstructure and tool chain. Thus, the ability of independent innovation, and a number of ARM IP licensing rights to purchase the so-called & ldquo; domestic & rdquo; domestic manufacturers CPU has a fundamentally different.

You may authorize the external IP

China recently announced a series of core IP licensing new products, worldwide licensed IP, to provide customers with first-class processor design. All Chinese core processors support HSA. In this regard, the chairman of China represents the core Like Yi, & ldquo; We are very pleased to see China's new core IP core through the PRM HSA compliance testing for industrial, networking, advanced driver assistance systems (ADAS) and embedding systems, check the IP power optimized to provide global licensing & rdquo ;.

It is understood that China is China the only two core may authorize the external processor IP companies (the other is Godson), most Chinese companies still in the era of CPU purchase of foreign IP integration do, China can develop IP cores and is capable of Foreign own authorization is particularly rare. At present, China-core CPU core can be provided by way of IP licensing to customers, the first CPU using 28nm HPC has made the process flow sheet in TSMC, the third quarter of 2016 to provide to potential customers for evaluation or development purposes.

Remarkable innovation in technology

The realization on silicon IP core is Chinese core 3 in 1 & ldquo; Unity & rdquo; architecture for the first time. Unity has including image and video processing, including multidimensional signal processing capabilities. Dr. CTO Mayan Moudgill Chinese core US research and development center, said, & ldquo; vector processing length can be dynamically set according to the needs of vector calculation, the maximum configuration to achieve 64KB, meet the requirements of a variety of intelligent applications and high performance computing to parallel computing, and to ensure that the instruction robust architecture and software code portability. While the variable length vector processing unit (VPU) during the processing of large amounts of data, combined with low power consumption, out of order execution pipeline and other advanced technology, enabling efficient execution of multiple threads of control & rdquo ;.

It is noteworthy that in the core of China for worldwide distribution after the news two days, ARM also announced the expansion of its vector architecture similar to the V8-A (SVE), its technical characteristics and the Chinese launched VPU core units are very similar. This shows real innovation in the domestic processor design company can meet international advanced level in the core technology, their innovative capacity far superior to those obtained by CPU design IP licensing company, after all, who in the space of independent innovation by the licensee of extremely harsh restrictions and constraints.

What are the different and SOC

On the market there are many SOC also integrated CPU, GPU, DSP computing unit, like Qualcomm Snapdragon chip, Huawei Hass Kirin chip integrates a CPU, GPU and DSP. So with these Chinese core SOC What difference will it make?

Mentioned before SOC of CPU, GPU, DSP instruction architectures, different micro-architecture, different tool chain is based on a multi-core integrated heterogeneous multicore chip made out, they tend to use the operating system to complete the multi-core, multi-task synchronization between and coordination, which will affect the efficiency. Here especially emphasize traditional SOC multicore programming is very difficult, although physically to achieve a single-chip multi-core integration, but in the development of habits and processes is not much different from the traditional on-board system, that performance is not high, power consumption is not small, multiple sets of tools, multiple teams, development difficult, more difficult to optimize.

In contrast, the core of China-based single-core chip architectures Unity already has CPU, IVP (image and video processor), DSP processing capability, but is a framework directive, a set of micro-architecture, a set of tools chain. In doing multi-core extensions, hardware accelerators and FPGA integration work, Utility Chinese architecture core will be in full compliance with the norms HSA, HSA can reuse software ecosystem. Thus, the core of China 3 in 1 practice in power consumption, cost, performance, software development threshold, team size and other aspects have a huge advantage.

It will be used to machine learning

Since the machine learning algorithms are used in many applications, and most algorithms agent (agent) requires a high degree of parallel computing, which leads to it is the ideal choice for these algorithms HSA platform, especially Chinese characteristics core CPU + DSP + IVP makes it has the advantage of aspects of performance per watt. It is also why, Parmance plans and Huaxia cores are on the ML-HSA cooperation projects & mdash; & mdash; the project for machine learning and deep neural networks, and optimized for the open source gccbrig project Huaxia core had initiated, gccbrig project any the platform provides support for GCC compiler (end conversion) function.

As to whether China will encounter the core software ecosystem constraints, Ph.D. CEO and Chairman of China Association HSA core US development center John Glossner think, HSA member companies no separate ecological system, but by including the world's major processor manufacturers, including HSA Union together build the ecosystem, in this case, the probability of success is much higher. & Ldquo; In the past four years, HSA Alliance has developed a hardware and software infrastructure to support heterogeneous systems. Developer ecosystem support compiler, runtime (runtime) and other open source implementation. HSA HSA Alliance also committed to providing cross-platform portable applications. I am pleased to participate in HSA Chinese core developers and application programming ecosystem & rdquo;, John represents.


Although the CPU, GPU, DSP and other traditional chips, China and the West that there is a relatively large gap, but in heterogeneous computing processor, a small gap between China and the international advanced level, and even in some areas is still in the lead, for example, following the Milky Way No. 2, domestic supercomputer divinity Taihu One again climb to the top of the world TOP500, to become the world's fastest computer. The successful development of China-core processor is a heterogeneous computing China useful attempt in this field. In this, I sincerely wish the Chinese and core Godson, Shen Wei, a company engaged in independent innovation processor capable hands of our generation to complete domestic independent innovation and leading-edge processors, to solve the kinds of national security, information industry development the kinds of difficulties.

