Benchmarks have always been a popular way to measure the performance of hardware, both in the PC and mobile space. Benchmark cheating occurs when a device maker unfairly tries to game a benchmark by making the results score better than they actually are. Cheating can happen in any benchmark tests including those that measure the CPU, system, or GPU performances. The obvious goal of cheating at benchmarks is so customers will be convinced that product A is better than product B. Unfortunately, this means that underlying weaknesses in hardware are masked by cheating. Put simply, it’s a lose-lose situation for all parties involved in the industry: chip vendors, device makers, and most of all, end-consumers.
Cheating was rampant in the PC space more than a decade ago, but thankfully, the practice has mostly ended. In the mobile world, Andrei Frumusanu (now the mobile editor at AnandTech) discovered benchmark cheating on the Exynos variant of the Samsung Galaxy S4 in 2013. The AnandTech mobile editors then discovered different varieties of benchmark cheating being done by many smartphone vendors. Quite a few major smartphones such as the Galaxy S4 and the Samsung Galaxy Note 3 were even de-listed from benchmarks like 3DMark, as companies such as Futuremark opted to publicly call out the device makers.
A new development has corrected this false assumption. AnandTech editors Andrei Frumusanu and Ian Cutress have published a report detailing verified instances of benchmark cheating done in GPU benchmarks by 2018 Huawei and Honor smartphones such as the Huawei P20, Huawei P20 Pro, and the Honor Play.
Summary: 2018 Huawei/Honor phones caught cheating in GPU benchmarks
AnandTech notes that in their review, the Huawei P20’s performance had regressed compared to the Huawei Mate 10 Pro. At that time, the publication was told by Huawei that it was a firmware issue, but in reality, that wasn’t the case. Huawei and Honor’s newer phones were coming with a benchmark detection mechanism that enables a much higher power limit for the SoC with a higher thermal headroom.
As explained by AnandTech, this means that for certain whitelisted applications, the latest Huawei and Honor phones perform much higher compared to what users will expect from other similar non-whitelisted software. This practice results in higher consumed power, lower efficiency, and reduced battery life.
The net result is that performance numbers are higher than what they actually should be. These numbers are unrealistic to achieve for any user in a true performance scenario. The publication also states that the efficiency of the SoC decreases when doing this, as it’s being “pushed well outside its standard operating window.” All in all, it makes the SoC look worse for the sake of higher benchmark numbers.
AnandTech states that Huawei’s benchmark behavior exceeds anything the publication has seen in the past. The publication used custom editions of benchmarks, so they can test with detection mode on and off. In their words, the massive differences in performance between the publicly available benchmarks and the internal versions that they use are “absolutely astonishing.”
GFXBench Scores on the Honor Play (Kirin 970) with Benchmark Detection off versus on. Source: AnandTech.
Huawei’s response to AnandTech‘s report
AnandTech spoke to Dr. Wang Chenglu, President of Software at Huawei’s Consumer Business Group. Dr. Chenglu admitted that the company was cheating at benchmarking because “others do the same thing, get high scores, and Huawei cannot stay silent.”
Dr. Chenglu stated that Huawei “wants to come together with others in China to find the best verification benchmark for user experience.” He pointed at other manufacturers of misleading with their numbers, citing an unnamed popular smartphone manufacturer in China as the biggest culprit. According to him, benchmark cheating is becoming “common practice in China,” and while Huawei wants to “open up” to consumers, they have trouble when competitors “continually post unrealistic scores.” Huawei is trying to face off against its major Chinese competition, a task which is seemingly made difficult when other device makers produce unrealistic score numbers.
Huawei’s goal is for standardization of benchmarks to level the playing field, and they told AnandTech that they want the media to help. As AnandTech notes, however, Huawei is promoting its own unrealistic scores for now.
Huawei’s reaction to the AnandTech report is that the company will ensure future benchmark data in presentations is independently verified by third parties at the time of the announcement.
AnandTech‘s findings in detail
AnandTech found a surprising difference in the scores produced by their internal versions of GPU benchmarks as compared to the publicly available benchmarks. The publication tested the Huawei P20, Huawei P20 Pro, and the Honor Play. All phones performed nearly identically in the higher power mode because they share the same HiSilicon Kirin 970 SoC. However, the real performance of the phones varies significantly as they all have different thermal limits. The different thermal limits are because of their different chassis/cooling designs, as AnandTech explained. The Huawei P20 Pro has the best thermals (because it is larger and more expensive than the other two), and it can, therefore, perform better in its true performance state.
An important point raised by the publication is the difference in the method of benchmark cheating. Mr. Frumusanu stated:
“In the past we’ve seen vendors actually raise the SoC frequencies, or locking them to their maximum states, raising performance beyond what’s usually available to generic applications. What Huawei instead is doing is boosting benchmark scores by coming at it from the other direction – the benchmarking applications are the only use-cases where the SoC actually performs to its advertised speeds. Meanwhile every other real-world application is throttled to a significant degree below that state due to the thermal limitations of the hardware. What we end up seeing with unthrottled performance is perhaps the ‘true’ form of an unconstrained SoC, although this is completely academic when compared to what users actually experience.” – Andrei Frumusanu and Ian Cutress, AnandTech
The power graphs demonstrate that Huawei’s newest phones already reach 3.5-4.4W in their true performance state, while 3.5W TDP is the maximum amount that can be sustained. On the other hand, the phone goes into overdrive with TDP when performing the publicly available benchmarks, with power figures going above 6W and peaking at 8.5W. As noted by AnandTech, these figures quickly trigger an overheating notification on the device, signifying the mismatch of thermal limits with software expectations.
Overheating Warning on Huawei/Honor devices. Source: AnandTech.
The takeaway here is that the true performance figures aren’t stable as they depend on the phone’s temperature. Huawei doesn’t block the GPU from reaching its peak GPU frequency state. The default behavior is actually a “harsh thermal throttling mechanism […] that will try to maintain significantly lower SoC temperature levels and overall power consumption.”
The phones’ normal mode can reach the same peak power consumption figures during the GPU benchmarks as the ones posted by the unthrottled variants. However, these numbers quickly fall back significantly, and AnandTech notes that the phone throttles down to 2.2W in some cases, which has the effect of significantly reducing performance.
The publication states that the benchmark cheating behavior has seemingly only been introduced in this year’s devices. Phones like the Huawei Mate 9 and the Huawei P10 are not affected, and it seems that only EMUI 8.0 and newer devices are affected. AnandTech was also told by Huawei that this was “purely a software implementation,” which corroborated the publication’s findings.
AnandTech posted GPU performance true figures for the affected Kirin 970-powered phones, and the conclusion is that Huawei is significantly behind its competitors in both GPU performance and efficiency.
The publication also posted power comparison graphs for the Kirin 970 and Kirin 960-powered Huawei devices. The graphs show that Huawei’s power throttling adjustments are in fact better for the user experience as they mitigate the problem of higher power consumption. AnandTech’s testing of the Kirin 960 showed that it had “awful GPU power characteristics,” while the Kirin 970-powered devices have a new strict throttling mechanism to bring down the power consumption and temperatures.
AnandTech adds that the new throttling policy makes sense when considering the fact that both the Kirin 960 and the Kirin 970 show power draws that are much above their sustainable levels for their respective form factors.
To be clear, Huawei hasn’t done wrong in introducing the new throttling mechanism. The big mistake here is the exclusion of popular benchmark applications via a whitelist, which is what is referred to as benchmark cheating in this case.
Response: Huawei’s official statement
Huawei sent the following statement to us about benchmark cheating:
Huawei always prioritizes the user experience rather than pursuing high benchmark scores – especially since there isn’t a direct connection between smartphone benchmarks and user experiences. Huawei smartphones use advanced technologies such as AI to optimize the performance of hardware, including the CPU, GPU and NPU.
When someone launches a photography app or plays a graphically-intensive game, Huawei’s intelligent software creates a smooth and stable user experience by applying the full capabilities of the hardware, while simultaneously managing the device’s temperature and power efficiency. For applications that aren’t as power intensive like browsing the web, it will only allocate the resources necessary to deliver the performance that’s needed.
In normal benchmarking scenarios, once Huawei’s software recognizes a benchmarking application, it intelligently adapts to “Performance Mode” and delivers optimum performance. Huawei is planning to provide users with access to “Performance Mode” so they can use the maximum power of their device when they need to.
Huawei – as the industry leader – is willing to work with partners to find the best benchmarking standards that can accurately evaluate the user experience.
The key takeaway here is that the company is planning to provide users with access to “Performance Mode” (Meizu-style) so that the users can use the “maximum power” of their device “when they need to.”
Response: UL delists the affected Huawei and Honor phones in its benchmarks
UL, having acquired Futuremark (the company behind PCMark and 3DMark), has delisted the Huawei P20, Huawei P20 Pro, Huawei Nova 3, and the Honor Play from 3DMark. The company has verified benchmark cheating on the Huawei P20 Pro, Huawei Nova 3, and the Honor Play. On the basis of AnandTech’s testing and reporting, it has also delisted the standard Huawei P20. Users will no longer be able to view the benchmark results of the affected phones as the company does not wish to host cheated benchmark scores.
The company found that the scores from the public 3DMark app were up to 47% higher than the scores from the private app (which is not available to the public), despite the fact that the tests are identical.
In the announcement, UL added that it was happy to see Huawei committing to adopt a more transparent approach in the future. UL’s view is that optional performance modes that can be set by the user are allowed under its current rules as long as they are disabled by default. The company states: “A device must run the benchmark as if it were any other application.”
In conclusion, all Huawei GPU performance benchmarks that were taken using publicly available benchmarks should not be taken as a representation of actual performance.
Response: UL and Huawei issue a joint statement
In response to UL’s decision to delist the Huawei P20, Huawei P20 Pro, Huawei Nova 3, and the Honor Play from 3DMark, Huawei reached out to UL to discuss the best practices for benchmark testing. Here is the statement offered to us:
Huawei and UL (creators of 3DMark) have held comprehensive discussions on benchmarking practices this week, and have reached a positive agreement on the next steps in working together.
In the discussion, Huawei explained that its smartphones use an artificial intelligent resource scheduling mechanism. Because different scenarios have different resource needs, the latest Huawei handsets leverage innovative technologies such as artificial intelligence to optimize resource allocation in a way so that the hardware can demonstrate its capabilities to the fullest extent, while fulfilling user demands across all scenarios.
UL understands the intent of Huawei’s approach, but is opposed to forcing the use of a “Performance Mode” by default when a benchmarking application is detected by the device. UL rules require a device to run the benchmark as if it were any other application.
Huawei respects consumers’ right to choose what to do with their devices. Therefore, Huawei will provide users with open access to “Performance Mode” in EMUI 9.0, so that the user can choose when to use the maximum power of their device.
Huawei and UL have also discussed current common benchmark testing methodologies in general. UL and Huawei would like to participate in an industry movement to develop benchmarking standards that best serve the needs of manufacturers, press, and consumers.
To prevent confusion around current benchmarking results, after discussion, UL and Huawei have temporarily delisted the benchmark scores of a range of Huawei devices, and will reinstate them after Huawei grants all users of Huawei handsets access to the Performance Mode.