Home > News content

Multi-core games are slower? Talk about CPU and game optimization

via:CnBeta     time:2019/7/28 10:30:12     readed:749


JD.com Mall

In the high-end series, nuclear warfare is even more frightening, on sale.AMDRuilong 3900X's 12-core 24 threads is enough to take a breath of cool breath, and the upcoming 3950X is equipped with 16-core 24 threads, a number of box enthusiasts have said that they can not control, shouting yes can not stop at all.

But will multi-core CPUs play games faster? Not always. Excluding differences in frequency and structure of IPC,In some cases, multi-core CPUs play games even slower --- for example, in some cases, AMD's new 12-core product 3900X is slower than 8-core 3700X in some games.

Some tests show that 3900X, which has more cores and more frequency, is not as good as 3700X in some cases.

You know, 3900X is overriding 3700X in terms of specifications, not to mention four more cores, even a higher frequency (3.8/4.6GHz vs 3.6/4.4GHz) and a doubling of level 3 cache (64MBvs32MB). Why is it that multicore games are slower?

Starting from this phenomenon today, let's talk about CPU and game optimization.

How difficult is multi-core optimization of the game?

Let's start with the multi-core optimization of the game. If you talk about the optimization of the game, you can't get around the support of multicore. What game is good for multi-core optimization, what game can only have one core difficult, multi-core onlooker, has always been the topic that players like to talk about.

Why do games have problems with multi-core optimization, and applications such as video compression can make full use of multi-core? This is related to the operation mechanism of the game.

Why do games like to use single core?

Video compression can easily achieve parallel computing, such as one thread compression of one fragment, another thread compression of another fragment, multi-core operation together, and finally compression of all the fragments, the complete video compression is completed.

The operation of the game is generally linear, and the operation of one step is closely related to the previous step, which is difficult to make full use of multiple threads.

For example, in FPS game, when a player is hit and causes damage, the damage result is related to the bullet trajectory. It needs to calculate the bullet trajectory before calculating the damage. This can only be done in one thread, and can not calculate the bullet trajectory and damage through multi-threads at the same time.

In order to make full use of multi-core, the game needs to divide the computing task into multi-threaded, such as physical collision, AI behavior, etc. The technical threshold is relatively high. Based on this, there are still a large number of games that are not fully utilized by all of the CPU's cores.

Supporting multiple cores must be optimized, okay?

With the development of the times, more and more games are willing to make efforts in multi-threaded optimization.

For example, in the past few years, we can often see the "i3 silent second" situation, and now the game masterpiece has raised the threshold to 4 cores, dual-core I3 has been embarrassing.

Nevertheless, 12 nuclei of 3900X still perform worse than 8 nuclei of 3700X. Why?

This is mainly due to the unreasonable CPU core scheduling. The architecture of Ruilong is quite special. Every four cores are packaged as a CCX, and every two CCX is packaged as a CCD. The communication between the core and the core can span CCX or even CCD. There are delays in communication between CCX and CCD.

In other words, if a program can call multiple cores, there are several situations.

1. The multi-core of the call is in the same CCX with the least delay.

2. Multi-core calls cross CCX, but in the same CCD, there is a delay.

3. Call multi-core cross-CCX, cross-CCD, maximum delay

For example, a game can call four cores, and the ideal scenario is naturally to call four cores in the same CCX, so as to achieve the best performance.

But in fact, code calls to multiple cores are not necessarily so smart, and it's likely that you can't identify which cores are located on the same CCX. As a result, the game may call multiple cores located in different CCX and CCD, resulting in additional delays that result in performance loss.

Knowing this, we can explain why the performance of 3900X is sometimes lower than 3700X. 3900X encapsulates two CCDs. Each CCD has two CCX. Each CCX has four cores. The original core is 4x2 = 16. After shielding the four cores, 12 cores are obtained.

However, 3700X has only one CCD, including two CCX, a total of 4x2 = 8 cores. Obviously, 3900X has one more CCD than 3700X, which may cause additional delay. If the game can not play the multi-core advantage of 3900X, then 3900X performance is slightly inferior to 3700X.

Therefore, even if the game is optimized for multi-core, more effort is needed in core scheduling to achieve the best performance.

I'm glad that,MicrosoftWe are aware of the relevant problems.WindowsOptimized in 10 1903, the system prioritizes scheduling at the core of the same CCX to avoid delays caused by cross-CCX.

If you want to make better use of AMD Ryzen processor performance, upgrading to Windows 10 1903 is necessary.

Is 2CPU Mononuclear Performance Really Squeezing Toothpaste?Is CPU mononuclear really squeezing toothpaste?Some people think that it is very difficult for CPU to make performance breakthroughs in frequency and improve efficiency in architecture. Reactor core is the only way to improve performance.

Some friends follow suitIntelIt is argued in "toothpaste squeeze" that the same-frequency performance of CPU has been stagnating for many years, while AMD's Zen2 architecture, although greatly improved in efficiency compared with its predecessors, only catches up with its competitors.

Playing games with a four-core CPU a few years ago and a four-core CPU now seems to have no difference in experience, which is also a strong evidence. But is that the case?

In fact, this view is one-sided. The reason why CPUs performed well in some tests and games a few years ago is that these tests and games did not optimize the instruction set for the new CPU.

In recent years, a great value of the new CPU lies in the addition of AVX, AVX2, TSX and other instruction sets.

If the code calls the corresponding instruction set, it can make more efficient use of Floating-Point Multiplier hybrid unit such as FMA, reduce idle CPU pipeline, and achieve considerable performance improvement.

These are instructions that have been increasing over the past decade, either without a core or squeezing toothpaste.

Take the famous rendering software Cinebench for example, which is a CPU testing tool familiar to DIY players.

The latest version of Connebench R20 has added AVX instruction set support as a major improvement over the old version of Connebench R15.

In the case of CPU support for AVX instruction set, the same rendering project runs in Cinebench R20 faster than even Cinebench R15! This is evident from the tremendous performance improvement of the new instruction set.

Zen2's single-core performance has improved so much, largely because of the dramatic improvement of AVX2's performance.

AVX-enabled or updated instruction sets have gradually become the norm in the fields of rendering, video compression, scientific computing, etc. The famous Linux distribution Fedora 32 even plans to not support CPUs without AVX instruction sets.

However, there are still a large number of games that do not follow the new instruction set such as AVX and only support the old SSE. The new CPU running these games is naturally not much different from the old CPU. In the aspect of instruction set support, the game still lacks proper optimization for CPU.

The famous game performance test component, 3D Mark, has realized this. In the new Time Spy Extreme test project, AVX, AVX2 and even AVX512 instruction set support were added, and AVX512 instruction set was invoked to run the score. The score was more than double that of SSE3.

New instruction sets such as AVX are becoming more and more important in real games. For example, Assassin Creed: Odyssey does not even support CPUs without AVX instruction sets (because they are too radical, they have to be compatible with old CPUs later).

For example, some games that use D encryption need FMA3 instruction set to decrypt and run correctly. Early "ShenU" E1230 V2 can only stare; if you are a PS3 simulator player, you have experienced the performance leap under TSX instruction set.

Generally speaking, the optimization of instruction set in most games is still insufficient. In the absence of instruction set optimization, the performance of the old CPU and the new CPU can not make a big difference.

But supporting the new instruction set is an unavoidable part of CPU optimization in games. Only by using the new instruction set can the value of the new CPU be highlighted. I hope more games can optimize the new CPU instruction set.

Later words

Whether it's adding CPU core multithreading or using new instruction sets to improve SIMD performance, CPU performance can be greatly enhanced.

As far as the consumer market is concerned, AMD seems to be more multi-core, while Intel is committed to implementing new instruction sets. However, no matter which direction of development, the corresponding software is needed to optimize it, in order to give full play to the performance of CPU.

Now is no longer an era in which all the performance of a new CPU can be achieved without changing a single line of code. Multi-core and advanced instruction sets, limited to the lack of game support, have to be reduced to the "future of war".

CPU is not squeezing toothpaste, and the optimization of CPU is far from the end. I hope we can see more games that can play the real power of CPU in the future.

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments