TPC-C Benchmarks For JDBC? 5
woggo asks: "I need to benchmark two different JDBC drivers for a research project and would like to use a standard benchmark. I was able to find this implementation of TPC-W, but that is too much of a test of the Web server to be useful for my purposes. Does anyone know of a freely-available Java implementation of TPC-C? It needs to be reasonably conformant and I need to be able to cite the results in a paper without violating a license agreement, which would seem to exclude evaluation versions of products."
JMeter to the rescue (Score:2)
JBench is a TPC-A and TPC-C implementation (Score:3)
Check here [openlinksw.com] for more info.
You're up a creek on this one. (Score:4)
The TPC is very correct in stating that a benchmark is not about one particular component. You can't test them in complete isolation. You must look at how a whole system performs under a specific workload. So it's actually pretty difficult to measure one component in isolation. If you're testing just the driver, you need to compare the drivers against the same workload, same schema, same data, same database. You can't test the WebLogic Type-4 Driver for Sybase with the Open Source Postgres Type-2 Driver. And since different databases are optimized for different things, you'll need to bear that in mind (this is something you'll probably know, but it's important for constructing a benchmark for your application).
Next is the issue of actually publishing benchmark results. First of all, you cannot publish a full TPC benchmark without it being audited. You can use their data, use their queries, but you cannot publish comprehensive numbers (such as QphD or QthM) because it's not permitted by the TPC. Look at how academic research papers in the database field do it. If they're doing query performance, they give you comparative numbers on the various queries, and maybe put it together into some artificial number, but they don't attempt to give a result. If you're looking at the TPC benchmarks, that's a very important consideration.
Now you get to the issue of publishing a result on ANY database. Not going to happen. Database vendors don't allow anybody to publish any benchmarks of their software under any circumstances, not even for academic reasons. The only people who get around that are people working for the companies, such as the IBM Almaden research people. You can't publish the results very easily, and have your advisor look into what you can use and/or publish. You might be stuck with Postgres or MySQL only.
And finally, there's the issue of what TPC-C is. Nobody ever does TPC-C in Java, nobody ever does it with JDBC. Have you read a full disclosure report for an audited run before? Not just the executive summary, the FULL disclosure report. They don't use Java. Hell, they don't use SQL. They code things directly to a low-level API which is completely native, and completely in C. Because here's a big insight which the TPC and RDBMS vendors will never tell you: TPC-C is designed to indicate how quickly an RDBMS can act just like a mainframe. That isn't very fast at all, so they have to "cheat" by taking SQL out of the picture. Read their client code. It's all native C library calls.
So with this we come to my recommendation: Use the TPC-W benchmark, for the following reasons:
That's what people use JDBC for. JDBC isn't used for TPC-C or TPC-H or TPC-Q workloads. It's used for web workloads. Why not try the benchmark used to simulate web workloads with a technology designed for web pages?
You can always play with the parameters. Don't like the reliance on static pages? Take them out entirely. Talk directly to the middle tier. Or reduce them in percentages to the point where they don't factor highly in the mix. If you can't get this tuned to where you see a performance difference, then the results are simple: JDBC performance doesn't matter. That's a pretty good result in itself.
You can also just take the bits out you care about. Are you willing to munge your own source code? Then extract the database code from TPC-W and run that.
You might also want to look at SQL3AP. It's good for measuring raw concurrency, but is a completely artificial benchmark with no relation to the real-world.
Much like any other Ask Slashdot, we're not provided with sufficient information to give you any real information at all. But hopefully this helps. If you care, email for more information.
Kirk Wylie
looks like you're in the clear (whew!!) (Score:1)
UCITA Reporter Ray Nimmer complained of "distortions" in the debate on UCITA, identifying as a "misrepresentation" the claim that "that UCITA allows licensors to prevent licensees from commenting about the products."[ 111] He said that "This allegation makes nice copy and superficial impact, but is simply untrue. You can scroll through the UCITA draft and will not find any such provision."
that was a close call.
:::
Re:You're up a creek on this one. (Score:2)
Thanks for your thoughts. The reason why I'm "benchmarking drivers" is that one driver, which I am developing, has client-side relation cache and delegates to the native driver for cache misses. I don't think my question was inexact -- I asked for a TPC-C implementation in Java, not for "a db benchmark in Java"; I asked for a particular specific thing, not for a recommendation of a type of thing.
The plan is to compare "apples-to-apples" by comparing my driver on top of a native driver to the native driver alone under a simulated "real" workload, using some benchmark thatmost people in the field will have a frame of reference to. Since I don't have a $50k computer at my immediate disposal, I'm not going to be publishing numbers that are immediately comparable to the published TPC results, but the ratio between the two should be meaningful and the fact that it is TPC-C should let someone know what sort of test it is.
I'm not interested in publishing "TPC results" for the purpose of selling hardware or software. I'm interested in saying "On this TPC-C-like workload, the driver with cache performed n times better than the driver without." Well, that's not entirely true. (I am at Wisconsin, the "inspiration" for the benchmark clause since 1983....) Although I was planning to test cached-Postgres vs. uncached-Postgres, DB2 will allow one to publish benchmarks. And if you want to publish relative benchmarks for a system with different parameters/drivers/etc. (as many papers have done), it is pretty straightforward to do it legally: just compare "DBMS A" to "DBMS B". That's what papers that I've read lately have done. However, Oracle-to-Sybase comparisons are not what I'm interested in, as I said.Thanks for your help, and I'm sorry I wasn't more clear by specifying that I was interested in comparing total throughput of two drivers on the same db. I may wind up only using the db code from TPC-W, but I think in that case, it would be smarter to just write my own benchmark, as "the db code from TPC-W" would have little more credibility/recognizability than "this benchmark with the following constraints...."
cheers,
wb