Making a Fair Gfx Benchmarking Utility? 40
Moggie68 asks: "Always when the big two release new GPU's and graphics cards that reach astounding heights with their benchmark scores, the same heated debate about unfair benchmarking utilities rises again. But what about the flipside of the coin? Would it really be that easy to construct a fair benchmarking utility for GPU's and graphics cards? What facts need to be considered? What problems solved?"
Stick to the games (Score:4, Insightful)
Here's the problem: ATI and NVidia have diverged a bit. They get performance upgrades from different optimizations/workflows. For this reason, performance is more a question of which card the game developer favors than it is about which card is better. Granted, what I'm saying isn't quite as black and white as that, but it's worth considering that if the benchmark uses an optimization that the game doesn't, then the benchmark is misleading.
I don't find video card benchmarks interesting, but I do enjoy CPU benchmarks. I'm a 3D artist, so render speed is very important to me. I recently had to go through the "Do I want a P4 or Athlon?" debate. Lightwave comes with benchmark scenes. You're supposed to load the scene, hit the render button, and write the number down. Some decent sites actually do the benchmark that way. That is a selling point for me, not the rest of those idiotic benchmarks that they throw in there. Yeah, like I care about how fast Office is.
I hope my point got across. Real world numbers are gold, theoretical numbers are pyrite.
One thing most benchmark folk miss (Score:3, Interesting)
Why? Coz they don't have antivirus software running in the background. AV software running in the background could change results significantly.
In most offices, the desktop PCs have AV software installed. If they don't have AV software installed, they usually have worms and viruses and those tend to take up more CPU.
That's real world.
Which AV software to use in the benchmark is one question that they may not want to deal with
But, hey, doesn'
Re:One thing most benchmark folk miss (Score:2)
Benchmarks are, or should be taken as, just guidelines.
In the real world there are huge number of varibles, old dll files from previous drivers, IM clients running in the background, stuff in boot config files which are old yet effects performance, stuff hanging around since the last clean reboot, physical environment etc.
OK, So here's what we do: (Score:3, Interesting)
We take a bunch of gamers and group them by what video card they own. We give each of them the test board. After one month we take away the test board and give them their old one back. The benchmark is: How many out of 10 owners of board X would buy the test board? Because that's what you really want to know, right? And who better to tell you this than people who own the same board you do?
Can't be done if driver authors want to skew it.. (Score:4, Informative)
The problem here is that the Windows driver model allows the driver to check what program is making calls into it. This is not a bad thing by itself, so I wouldn't advocate getting rid of it.
So.. lets say you make a new benchmarking program and you don't leak any copies out to the graphics people. What happen when you release it? It might work and be fair on the current batch of drivers.. but as soon as the graphics people get their hands on it, there's nothing you can do to prevent them from "optimizing" (tuning down rendering) for your benchmark.
So maybe you can make a fair benchmark today. But as soon as you give it to anyone, don't bet on it being fair on the next driver revision.
-molo
Re:Can't be done if driver authors want to skew it (Score:3, Interesting)
Re:Can't be done if driver authors want to skew it (Score:2)
Re:Can't be done if driver authors want to skew it (Score:2)
DRM? (Score:1)
then how about benchmarking in Linux or FreeBSD. They both support Direct Rendering Manager
I thought Microsoft was using Linux's and FreeBSD's non-support of DRM as a selling point for Windows.
Oh, that DRM.
Re:Can't be done if driver authors want to skew it (Score:3, Interesting)
-molo
Re:Can't be done if driver authors want to skew it (Score:2)
Hey, this aint MSDN. Get your priorities straight!
Re:Can't be done if driver authors want to skew it (Score:2)
Re:Can't be done if driver authors want to skew it (Score:1)
Fair Benchmarks (Score:4, Insightful)
Re:Fair Benchmarks (Score:2)
However, benchmarks are a good ballpark guide to whether I should buy an ATI or Nvidia. I am so glad I read the benchmarks when I made the ATI vs Nvidia decision recently. This time ATI won me over for price and performance. Previously it was NVidia. Who knows who will win next round. It seems that just as soon as you have
Those who do not learn from history are doomed to (Score:4, Interesting)
Does anyone still care about MIPS, MFLOPS, Dhrystone, Whetstone, or SPEC? Why do we want to rehash history with GPU's?
If you want a synthetic benchmark, the companies will make their product work well with the benchmark, a little else. When the inevitable happens (As it has with both major players) you should neither get upset nor demand a better benchmark, instead laugh when someone fronts a synthetic benchmark score.
So you want to know if a card you are going to buy will work well for a game that is going to come out in 6 months to a year. We'd all like to know the future as well, I'd prefer a crystal ball.
Mutual generation of fair tests (Score:4, Interesting)
Then run all four suites on both systems and take normalized averages. The best system can win only by being robust and of overall high performance. With four tests in all, the vendor's own "best foot forward" suite can't overweight the result. And with the other vendor looking for any weaknesses, the downsides of each vendor's system becomes quite evident.
Such testing may not produce over-optimized one-application super-stars, but it should lead to well-rounded graphics boards for high performance on a range of graphical display tasks.
I bet that ATI and NVidia will never go for this approach becuase it would lead to real head-to-head fair competition as opposed to carefully staged, optimized, marketing-controlled demos.
Re:Mutual generation of fair tests (Score:1)
Trying to crash the other vendor's system is OK (Score:2)
These would be true test suites as opposed to nice speed demo suites. As a graphic board customer, I do want speed. But I would probably say that robustness has a higher implicit priority. A graphics chip that crashes is the last thing I want, regardless of how fast it is on some more limite
Re:Trying to crash the other vendor's system is OK (Score:1)
Here's the thing - when you run a game that crashes the graphics chips, you don't patch the drivers, you patch the game. Writing drivers that will survive running malicious code takes time away from addressing other programming issues and the thing is that no one except for your compititor is wr
Worms (no, not the game) (Score:2, Interesting)
Writing drivers that will survive running malicious code takes time away from addressing other programming issues and the thing is that no one except for your compititor is writing that kind of code into their App.
What if somebody finds a way to break Windows through a video driver bug? What if somebody puts that exploit into the next Windows worm?
The more fundamental problem is that all any kind of test can ever measure is your ability to do well at that test.
And if that test measures a video car
Re:Mutual generation of fair tests (Score:1)
Then you'll have driver manufacturers figuring out a way to disable the sleep system call....
Sleep calls are OK (Score:2)
Sleep calls cannot bias the results unless the two cards have different definitions of "sleep." Bypassing sl
Re:Sleep calls are OK (Score:1)
Benchmarks and Subjectivity (Score:3, Insightful)
The performance question is harder. But again, popular games level the playing field. When you benchmark using a game you know that programmers are actually using the features you are testing.
And finally, there is the matter of cheating. If a manufacturer is noticeably decreasing image quality for frame rate, he is usually "cheating." When image quality is maintained, it is an optimization. So again, it becomes a matter of subjective judgments of the human eye.
Subjective judgments are not so bad of course. A five star restaurant is only subjectively better than a two star restaurant. But usually that will mean a lot to the customer. So we can tolerate the errors that come from benchmarking cards from games pretty well. When manufacturers pull their tricks, you can bet that the review sites will be there to catch them.
Cheating 101 (Score:1)
Re:Cheating 101 (Score:1, Interesting)
I fail to see the problem (Score:3, Informative)
I can't see why you'd care whether a vendor is "cheating" or not. Lets say that you're a Tribes 2 fan. You run out and look at Tribes 2 benchmarks in reviews. The reviewer says something about image quality, and includes bits of screenshots (I vaguely remember this happening with the Riva128 and G200 the last time I purchased a 3d card for gaming). End of story.
Now, there are a couple of possibilities. First, both you and the reviewer can't see the image quality degradation that's taking place, and you do notice the speed increase. That's not cheating! The card vendor has just figured out a way to provide you with more resources that you care about at the cost of something that you don't even notice. We do this all the time with lossy compression in JPEG and MP3 -- you don't care about 90% of the data, but you do care about the size savings. People didn't care when lossy texture compression became the standard on video cards because the only thing that lossless compression gives them is a psychological "this is a flawless image".
Another possibility is that the reviewer or you notice image quality degradation. If this is the case, the card gets a lower image quality score. Big deal!
Finally, you may be worried about game-specific tweaking in that the game won't provide a representative sample of how the card will do on other games. This is *always* the case! Cards could perform quite differently on any set of games just due to the fact that designs differ, and different things form a bottleneck on different cards in different games.
Just let some reviewer sit down and try the stupid card out, and if they're enjoying the card...hey, who cares what hacks are included in the driver?
Re:I fail to see the problem (Score:2)
that is, the game itself will NOT run as the benchmark portrays, the tweaks being useless or normal gaming.
if any sanity in how the drivers act from a programmer point of view it should be that the program is tweaked for the drivers, not the other way around(as the driver should just do what the spec says, and do it exactly. i fail to see where the point is whoring the driv
Re:I fail to see the problem (Score:2)
Also, its going to get to the point where the cheating could be that it detects when a screenshot is taken and then boosts up the quality for the current frame.
I haven't heard it happening but thats what its going to get to.
Re:I fail to see the problem (Score:2)
Someone figured out that the two or three releases ago, the Nvidia Detonator's did exactly that, detecting screenshots, and boosting up the quality for that frame.
Unfortunately, it is difficult to determine if the drivers are still exhibiting that behavior, because Nvidia now supplies drivers where the code is encrypted, and decrypted in a 'just-in-time' fashion.
Sketchy. Very very sketchy. ATI for me.
Bleh people have it all wrong. (Score:1)
The card that correlates to the most wins is obviously the superior video card.
Objective benchmarking (Score:2)
Once that's out of the way, the next step is to crank up scene complexity until the rendering rate drops. Crank up the polygon count, the texture count, the shader count, etc. until the card misses a frame refresh time. That's what matters when you're running 3D applications. It's also what m
It's easy to prevent bad drivers. (Score:2)