Improving Techtuber Benchmarking Visualization

juj, 2024-08-07

It is the new CPU season and AMD has just released new Zen 5 CPUs. Techtubers are running performance benchmarks to measure the performance of the new SKUs that we are getting.

When the consideration is only about raw performance, these reviewers usually do a great job. However, sometimes the attention in these benchmarks turn into tradeoffs. For example, performance vs power, or performance vs price. This is where Youtubers could do better.

With the new Zen 5 it looks like AMD has decided to allocate their process node improvements largely into power consumption reduction, rather than to improve CPU speed out of the box. As result, talking about these CPUs does not make sense unless both performance and power consumption are investigated together.

Unfortunately the visuals that YouTubers produce to perform these comparisons are not ideal. For example, Hardware Unboxed produced the following two charts:

The above kind of presentation is not optimal. It is not easy for the viewer to combine this information and to distinguish between the objective and subjective value by staring at these two graphs. Doing so takes time and cross-referencing between the two graphs. For example, which products should I look to purchase if I wanted both the best Cinebench score and the lowest Power consumption? Is Ryzen 7900X a good buy under these two metrics? Not so easy to answer even after pausing to examine the data for a while, eh?

Then, after presenting the data, the reviewers then typically go on to present their own value judgement on the product for the viewer.

Unfortunately, when these value propositions are being made, things go a bit wrong. The problem here is that it plays into the notion that a value judgement would somehow be the same for everyone, i.e. that the notion of a "sweet spot of value" would be something objectively decidable as well. This is not at all the case.

How can we improve then?

Well, the answer here is to use a better visualization called Pareto Frontier Graphs.

These graphs are 2D X-Y scatter plots, that have a particularly pleasing property of demarcating the two axes of objectively superior performance from the subjective value analysis, providing the viewer an easy visualization to quickly find the "optimal set of products". From this set of optimal products, the viewer can then make their own subjective value judgement easily.

To explain what this means, it is simplest to just look at such a Pareto Frontier X-Y scatter plot. To do so, I have taken the same data from the above two 1D graphs from Hardware Unboxed, and revisualized the data as a X-Y scatter plot with the Pareto frontier visible. Let's take a look.

This 2D graph puts the raw data in the two previous 1D bar charts from Hardware Unboxed together to plot each CPU on the achieved score vs consumed power scale.

What can make this graph disorienting at first is that the viewer needs to find which direction (which corner) the best product will be located at. In this case, best products will veer towards the top-left corner.

Interpreting a X-Y Scatter Plot Pareto Graph

There are three interesting aspects that a graph like this provides to the viewer.

For each item, the objectively vs subjectively better and worse products can be found in one of the four quadrant directions relative to that item. For example:

This allows one to quickly visually identify, which products are optimal objectively (or inferior objectively), and which products one might prefer to be the best to them, albeit only subjectively. (Note how the visualization above particularly shades the bottom right quadrant in gray, more on that below)
The second interesting idea in the graph is the solid black axis-aligned "zig-zag line" that runs through the graph. This is the Pareto Frontier. It represents the set of optimal products in the measured axes. CPUs that fall on this line form "the best" products on the combined Cinebench 2024 score vs Consumed Power axis.
In particular, there are seven products that live on this line:
- AMD Ryzen 5 5600X
- AMD Ryzen 7 7800X3D
- AMD Ryzen 7 7700
- AMD Ryzen 7 9700X
- AMD Ryzen 7 7900X3D
- AMD Ryzen 9 7950X3D
- AMD Ryzen 9 7950X
What this means is that if a consumer cares specifically about Cinebench 2024 scores and Power Efficiency (W), these seven products are the only meaningful ones that they would consider to purchase. Note how this information is visually clear to identify, compared to trying to read this information by mentally combining data in the two distinct 1D bar charts shown earlier on this page.

Collectively, all these seven "objectively optimal" CPUs shade an area in the bottom-right side of the graph (in light gray background). CPUs that fall in this shaded area are objectively suboptimal when compared on these two metrics. For example, the above image shows how the Core i9-12900K is located in the gray, so therefore would be a suboptimal purchase on the Cinebench-Watts axis, because Ryzen 9 7950X exists, achieving both a better score and a lower Watts consumpion.

So if the user wanted to make a purchase only caring about these two Cinebench 2024 score metrics, they would make the best decision to choose any one of the seven above "Pareto optimal" CPUs, subject to their own preferences.

Further, what is pleasing about the Pareto frontier presentation is that it also provides perspective on the different subjective tastes. For example, see above how Ryzen 9 7950X provides a little bit better score than Ryzen 9 7950X3D, but at a larger increase to power consumption. Is that worth it? That is a subjective call to the user's own preferences, there is no one correct value call. In this manner a Pareto Frontier visualization makes it crystal clear what is objective versus what is subjective.

What is great is that the positioning of these two products in the 2D plot geometrically show this tradeoff in clear perspective. "Halo" products are immediately identified in context, because their tradeoffs can be visually quantified.
The third interesting idea is that this kind of a Pareto graph is able to highlight the generational improvement that a given product SKU upgrade provides, at a glance.

A green line is drawn from the previous generation Ryzen 7 7700X over to the Ryzen 7 9700X, to highlight the kind of a shift that the previous-to-new generation improvement provided. The 2D area that is shaded in green visually identifies "how much of competitiveness" this new product SKU brings to the market.

As can be seen in this green shaded area, the improvements that the new 9700X CPU brings (to CB2024 vs Power comparison) are not particularly large. This change is however enough to make the 9700X CPU relevant again in Cinebench 2024: see how the earlier SKU 7700X was "in the gray" zone, meaning that it had been rendered inferior by the Ryzen 7900X3D.

TechTubers are probably right to say that the newly released CPU is a bit underwhelming. It "conquers" only a small amount of green area in the Cinebench score vs Power consumption comparison. A Pareto Frontier chart quantifies this amount perfectly in context with all the other CPUs.

Also notice how this kind of visualization allows one to speculate tradeoffs to the future. Where will the new Ryzen 9900X and 9950X have to fall in order to be relevant? Well, they will need to be located somewhere in the top-left white area of the chart, or they will fall inferior to another already existing SKU.

Second Example

Phoronix conducted an extensive benchmark review of Zen 5 9700X and 9600X CPUs.
In their conclusion, they published one 1D bar chart of a Geometric Mean score of all their tests, and another 1D Percentile Plot chart of the Power Consumption of the various CPUs

There is a lot of extensive data here, although it too suffers from the same limitation as the previous Hardware Unboxed example.

Let's visualize this data too as a 2D Pareto Front graph:

Wow, so much more insightful. From this data we can see that the Ryzen 7 9700X performed quite well in the Phoronix Linux test suites, while the 9600X got directly shadowed by the 9700X. See how the green shaded area is now slightly bigger and the Ryzen 9 7900 and Ryzen 5 9600X CPUs fall inside this area? This means that under these two metrics (avg test score vs avg power consumption), the Ryzen 7 9700X makes these two other CPUs suboptimal.

Also see how the shift over the green line (7700X → 9700X) is much longer? This data suggests that the generational improvement of the Ryzen 7 9700X CPU from the 7700X is much larger under the workloads that Phoronix tested. The Ryzen 7 9700X CPU behaves similar to the Ryzen 9 7900 CPU. Head over to Phoronix: AMD Ryzen 5 9600X & Ryzen 7 9700X Offer Excellent Linux Performance to read their full review.

Third Example

In addition to the previous 1D bar charts, in the same Zen 5 review video Hardware Unboxed published the following chart:

What they have done is they divided the current price of each CPU by the average FPS that these CPUs can achieve in games.

This visualization represents a common maths mistake. Superficially it might seem like somehow good data from performing this division calculus would result, but it is hard to even begin to state how bad decisions this kind of bad mathematics can lead to, and how it is unable to distinguish between objective vs subjective performance. To not get too sidetracked here, refer to e.g. the cpubenchmark.net CPU Value chart to see how meaningless data this results in. This is not proper mathematics.

It would be reasonable to divide performance per dollar like this, IF users would be able to stack up more and more CPUs into a PC linearly, like hard drives in a data center. But since a user can only have one CPU, i.e. there is no scalability, this data grossly misleads the user, on both objective and subjective value grounds.

There is a proper way to perform Performance per Dollar analysis however. Yes, that is the 2D X-Y Pareto Frontier Scatter Plot again.

Fortunately Hardware Unboxed provided the source data they derived their calculations from, so let's revisualize that information for a better look.

Notice how the relative performance positioning of the different SKUs immediately becomes obvious, and one can at a glance distinguish between the objectively superior CPUs that reside on the Pareto line (7600X, 9600X, 7700X, 14600K, 7800X3D) versus the objectively inferior CPUs in the gray shaded area (5800X3D, 9700X, 14700K).

Not only that, this 2D graph also clearly paints a perspective of "how much inferior" these inferior SKUs are. The deeper a CPU is "in the gray", the worse it is. Also, from this graph it is obvious that the 9700X CPU is suboptimal due to the existence of the 14600K that provides just slightly more fps, at $60 less price. If it weren't for that Intel CPU, then the 9700X CPU would sit on the Pareto line between the 7700X and 7800X3D in the product placement stack.

It is really neat to see how cleanly this kind of a graph "filters out" the objectively suboptimal products at a glance. The 9700X CPU does not cut it, so is left "in the gray". However the 9600X CPU does fall on the optimal Pareto front line, even if just barely.
Also, this graph helps the viewer extrapolate: maybe at our own local retailer the prices of the CPUs might be a bit different. Prices are not set in stone, so instead of taking the value judgement made at the time of the video release, in this graph one can replace the price of a 9700X with that at the retailer near me, and still be able to do a value analysis by visually relocating the SKU with our local price. Does the 9700X CPU perhaps cost $240 at your retailer maybe a few months after the release? Well, just taking a peek at this graph, you will be able to visually relocate where the CPU would then fall with respect to the other compared CPUs to analyze it would begin to be worth the purchase at that price.

Interactively hovering over the 9600X CPU gives a simple indication of the subjective value (or lack thereof) it provides. We see that maybe the price of the new 9600X would have to be positioned about $80 cheaper for it to make sense for the presented gaming workload.
Summary

In this new CPU season, there will be a lot of new benchmarks published. It is clear that AMD's new CPUs will need to be measured in the performance-vs-watt and performance-vs-dollar efficiency scales. However, there is only one way to do it that is both visual and can combine two metrics simultaneously. That is not to divide numbers together, or to show a series of 1D bar charts.

Please ask your fellow TechTubers to improve their data visualization literacy so that we can one day level up to watching YouTube videos with more informative data statistics. 🙏

-juj
Update 2024-08-09: Bonus graph: I took the above Phoronix benchmark scores and plotted these against the price of each CPU obtained from Amazon.com on 9th of Aug. 2024 to produce a 2D CPU value comparison chart. Check this out:

In the above graph, an orange dotted line connects together the Zen 3 CPU family, a magenta dotted line links the Zen 4 CPU family, and a blue dotted line the new Zen 5 CPU family. This helps see the Zen3 → Zen4 upgrade value proposition, and the Zen4 → Zen5 upgrade value proposition (or lack thereof).
Under these workloads, one can immediately see that Ryzen 7 9700X loses in value compared to Ryzen 9 7900X. They both have the same price ($359), but the 7900X scored 92.55 pt and the new 9700X only 90.08 pt. The 9700X CPU is much more power efficient so it likely has better PBO headroom, but if you will not run your CPU using PBO and instead always just run it at stock Power Limits, then the earlier Ryzen 9 7900X would be a better purchase.

The Ryzen 5 9600X on the other hand shows objective value over the Ryzen 7 7700X. It takes a spot between the Core i5-13600K and the Ryzen 9 7900X on the Pareto line.

Improving Techtuber Benchmarking Visualization

juj, 2024-08-07

How can we improve then?

Interpreting a X-Y Scatter Plot Pareto Graph

Second Example

Third Example

Summary