Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Graphics Software Hardware

Making a Fair Gfx Benchmarking Utility? 40

Moggie68 asks: "Always when the big two release new GPU's and graphics cards that reach astounding heights with their benchmark scores, the same heated debate about unfair benchmarking utilities rises again. But what about the flipside of the coin? Would it really be that easy to construct a fair benchmarking utility for GPU's and graphics cards? What facts need to be considered? What problems solved?"
This discussion has been archived. No new comments can be posted.

Making a Fair Gfx Benchmarking Utility?

Comments Filter:
  • Stick to the games (Score:4, Insightful)

    by NanoGator ( 522640 ) on Friday September 19, 2003 @08:19PM (#7009290) Homepage Journal
    Just stick to using popular games. Seriously.

    Here's the problem: ATI and NVidia have diverged a bit. They get performance upgrades from different optimizations/workflows. For this reason, performance is more a question of which card the game developer favors than it is about which card is better. Granted, what I'm saying isn't quite as black and white as that, but it's worth considering that if the benchmark uses an optimization that the game doesn't, then the benchmark is misleading.

    I don't find video card benchmarks interesting, but I do enjoy CPU benchmarks. I'm a 3D artist, so render speed is very important to me. I recently had to go through the "Do I want a P4 or Athlon?" debate. Lightwave comes with benchmark scenes. You're supposed to load the scene, hit the render button, and write the number down. Some decent sites actually do the benchmark that way. That is a selling point for me, not the rest of those idiotic benchmarks that they throw in there. Yeah, like I care about how fast Office is.

    I hope my point got across. Real world numbers are gold, theoretical numbers are pyrite.
    • Those typical office/desktop benchmarks aren't real world.

      Why? Coz they don't have antivirus software running in the background. AV software running in the background could change results significantly.

      In most offices, the desktop PCs have AV software installed. If they don't have AV software installed, they usually have worms and viruses and those tend to take up more CPU.

      That's real world.

      Which AV software to use in the benchmark is one question that they may not want to deal with ;).

      But, hey, doesn'

      • Benchmarks are, or should be taken as, just guidelines.

        In the real world there are huge number of varibles, old dll files from previous drivers, IM clients running in the background, stuff in boot config files which are old yet effects performance, stuff hanging around since the last clean reboot, physical environment etc.
    • OK, So here's what we do:

      We take a bunch of gamers and group them by what video card they own. We give each of them the test board. After one month we take away the test board and give them their old one back. The benchmark is: How many out of 10 owners of board X would buy the test board? Because that's what you really want to know, right? And who better to tell you this than people who own the same board you do?

  • by molo ( 94384 ) on Friday September 19, 2003 @08:25PM (#7009323) Journal
    This is the probelem: the graphics drivers check the process/executable to see what program is making the graphics calls. If it matches a known target profile (benchmarking, quake3, etc), the graphics are tuned.

    The problem here is that the Windows driver model allows the driver to check what program is making calls into it. This is not a bad thing by itself, so I wouldn't advocate getting rid of it.

    So.. lets say you make a new benchmarking program and you don't leak any copies out to the graphics people. What happen when you release it? It might work and be fair on the current batch of drivers.. but as soon as the graphics people get their hands on it, there's nothing you can do to prevent them from "optimizing" (tuning down rendering) for your benchmark.

    So maybe you can make a fair benchmark today. But as soon as you give it to anyone, don't bet on it being fair on the next driver revision.

    -molo
  • Fair Benchmarks (Score:4, Insightful)

    by m0rph3us0 ( 549631 ) on Friday September 19, 2003 @08:33PM (#7009371)
    There is no such thing as a fair benchmark. Each persons needs differ and therefore a different product suits those needs best. Best thing to do, is grab demo's of the things you like to do with your video cards and then head down to your local computer store and see how it works.
    • I agree, there is always some way to screw with benchmarks. Especially when there are so many settings in the display adapter properties that can screw your performance or increase it.

      However, benchmarks are a good ballpark guide to whether I should buy an ATI or Nvidia. I am so glad I read the benchmarks when I made the ATI vs Nvidia decision recently. This time ATI won me over for price and performance. Previously it was NVidia. Who knows who will win next round. It seems that just as soon as you have
  • by Mad Quacker ( 3327 ) on Friday September 19, 2003 @08:50PM (#7009458) Homepage
    ...repeat it

    Does anyone still care about MIPS, MFLOPS, Dhrystone, Whetstone, or SPEC? Why do we want to rehash history with GPU's?

    If you want a synthetic benchmark, the companies will make their product work well with the benchmark, a little else. When the inevitable happens (As it has with both major players) you should neither get upset nor demand a better benchmark, instead laugh when someone fronts a synthetic benchmark score.

    So you want to know if a card you are going to buy will work well for a game that is going to come out in 6 months to a year. We'd all like to know the future as well, I'd prefer a crystal ball.
  • by G4from128k ( 686170 ) on Friday September 19, 2003 @08:54PM (#7009486)
    One possibility is to have each vendor create two test suites -- a suite that the vendor thinks highlights the best performance features of their own system and a suite that highlights the worst performance features of the competitor's system. For two vendors, this results in a total for 4 test suites (vendor 1's favorites, vendor 1's killer for vendor 2, vendor 2's favorites, vendor 2's killer for vendor 1).

    Then run all four suites on both systems and take normalized averages. The best system can win only by being robust and of overall high performance. With four tests in all, the vendor's own "best foot forward" suite can't overweight the result. And with the other vendor looking for any weaknesses, the downsides of each vendor's system becomes quite evident.

    Such testing may not produce over-optimized one-application super-stars, but it should lead to well-rounded graphics boards for high performance on a range of graphical display tasks.

    I bet that ATI and NVidia will never go for this approach becuase it would lead to real head-to-head fair competition as opposed to carefully staged, optimized, marketing-controlled demos.
    • Or they'd spend all of their time writing suites that crash the other guy's system.
      • This would drive both vendors to improve the robustness of their chips and drivers. Knowing that the competitor is goign to try to crash your system would put pressure on the development team to avoid or fix bugs.

        These would be true test suites as opposed to nice speed demo suites. As a graphic board customer, I do want speed. But I would probably say that robustness has a higher implicit priority. A graphics chip that crashes is the last thing I want, regardless of how fast it is on some more limite
        • This would drive both vendors to improve the robustness of their chips and drivers. Knowing that the competitor is goign to try to crash your system would put pressure on the development team to avoid or fix bugs.

          Here's the thing - when you run a game that crashes the graphics chips, you don't patch the drivers, you patch the game. Writing drivers that will survive running malicious code takes time away from addressing other programming issues and the thing is that no one except for your compititor is wr
          • Writing drivers that will survive running malicious code takes time away from addressing other programming issues and the thing is that no one except for your compititor is writing that kind of code into their App.

            What if somebody finds a way to break Windows through a video driver bug? What if somebody puts that exploit into the next Windows worm?

            The more fundamental problem is that all any kind of test can ever measure is your ability to do well at that test.

            And if that test measures a video car

    • This is not acceptable. The benchmarks cannot be developed by anyone influenced by the hardware manufacturers. Otherwise you'll have manufacturer A putting sleep calls in their anti-manufacturer-B benchmark and vice versa. Then you'll just have a test of how quickly the computer finishes 2000 calls to sleep for 100ms, rather than 2000 calls to draw the screen and swap buffers.

      Then you'll have driver manufacturers figuring out a way to disable the sleep system call....

      • Perhaps I did not explain the idea well enough. Since manufacturer A has to also run the anti-manufacturer B test suite, any sleep calls will effect both of them. Because every card as to run ALL of the tests (both the "best-case" tests and "worst-case" tests of all cards), each manufacturer must make sure that their own card can handle whatever they are trying to throw at the competitor's card.

        Sleep calls cannot bias the results unless the two cards have different definitions of "sleep." Bypassing sl
        • Sleep calls don't go to the card. They tell the scheduler "don't run this program for the next X milliseconds." The scheduler will not schedule the test program at all. All the other manufacturer has to do is put more sleep calls in than the first manufacturer.

  • by Henry V .009 ( 518000 ) on Friday September 19, 2003 @08:56PM (#7009498) Journal
    The problems with benchmarking graphics cards have traditionally been:
    1. How do you benchmark image quality?
    2. How do you compare different performance advantages in different areas?
    3. How do you stop the card manufacturers from cheating on the tests?
    The only way to test the first is with the human eye. You need to look at two images and make a subjective decision on which is better. And the programs that generally have the right amount of graphical frills are popular games.

    The performance question is harder. But again, popular games level the playing field. When you benchmark using a game you know that programmers are actually using the features you are testing.

    And finally, there is the matter of cheating. If a manufacturer is noticeably decreasing image quality for frame rate, he is usually "cheating." When image quality is maintained, it is an optimization. So again, it becomes a matter of subjective judgments of the human eye.

    Subjective judgments are not so bad of course. A five star restaurant is only subjectively better than a two star restaurant. But usually that will mean a lot to the customer. So we can tolerate the errors that come from benchmarking cards from games pretty well. When manufacturers pull their tricks, you can bet that the review sites will be there to catch them.
  • In graphics, everything is redundant because you really can't see that lone pixel among the other 1920x1440. So the solution is to render one out of every four polygons... tada, 4x performance.
    • Re:Cheating 101 (Score:1, Interesting)

      by Anonymous Coward
      Yes that'll work... right up until the drivers decide to drop that huge polygon that was supposed to be part of a mountainside.
  • by 0x0d0a ( 568518 ) on Saturday September 20, 2003 @04:52AM (#7010854) Journal
    So...what exactly is wrong with this?

    I can't see why you'd care whether a vendor is "cheating" or not. Lets say that you're a Tribes 2 fan. You run out and look at Tribes 2 benchmarks in reviews. The reviewer says something about image quality, and includes bits of screenshots (I vaguely remember this happening with the Riva128 and G200 the last time I purchased a 3d card for gaming). End of story.

    Now, there are a couple of possibilities. First, both you and the reviewer can't see the image quality degradation that's taking place, and you do notice the speed increase. That's not cheating! The card vendor has just figured out a way to provide you with more resources that you care about at the cost of something that you don't even notice. We do this all the time with lossy compression in JPEG and MP3 -- you don't care about 90% of the data, but you do care about the size savings. People didn't care when lossy texture compression became the standard on video cards because the only thing that lossless compression gives them is a psychological "this is a flawless image".

    Another possibility is that the reviewer or you notice image quality degradation. If this is the case, the card gets a lower image quality score. Big deal!

    Finally, you may be worried about game-specific tweaking in that the game won't provide a representative sample of how the card will do on other games. This is *always* the case! Cards could perform quite differently on any set of games just due to the fact that designs differ, and different things form a bottleneck on different cards in different games.

    Just let some reviewer sit down and try the stupid card out, and if they're enjoying the card...hey, who cares what hacks are included in the driver?
    • well.. the problems escalate when the drivers is tweaked only for those default benchmarking runs, having precalculated data for them.

      that is, the game itself will NOT run as the benchmark portrays, the tweaks being useless or normal gaming.

      if any sanity in how the drivers act from a programmer point of view it should be that the program is tweaked for the drivers, not the other way around(as the driver should just do what the spec says, and do it exactly. i fail to see where the point is whoring the driv

      • Also, its going to get to the point where the cheating could be that it detects when a screenshot is taken and then boosts up the quality for the current frame.

        I haven't heard it happening but thats what its going to get to.
        • In fact, this HAS already happened.

          Someone figured out that the two or three releases ago, the Nvidia Detonator's did exactly that, detecting screenshots, and boosting up the quality for that frame.

          Unfortunately, it is difficult to determine if the drivers are still exhibiting that behavior, because Nvidia now supplies drivers where the code is encrypted, and decrypted in a 'just-in-time' fashion.

          Sketchy. Very very sketchy. ATI for me.
  • You need to find the video card that results in the most number of winners. Scrounge up cash, run a lan party, and get down info on the video cards that people are using.

    The card that correlates to the most wins is obviously the superior video card.
  • For OpenGL (but not for Direct-X) there are benchmarks that check that the scene was rendered correctly, by reading back the rendered image. So there's an objective definition of correctness. Run them and check.

    Once that's out of the way, the next step is to crank up scene complexity until the rendering rate drops. Crank up the polygon count, the texture count, the shader count, etc. until the card misses a frame refresh time. That's what matters when you're running 3D applications. It's also what m

  • Randomize your benchmark. It'll take a few more runs to get an average performance figure, but then the benchmark is immune to cheating drivers.

"All the people are so happy now, their heads are caving in. I'm glad they are a snowman with protective rubber skin" -- They Might Be Giants

Working...