There are a lot of discussions on the usability of such tools; are the measured performance any close to the reality? The load these tools generate is more or less synthetic. One tool that stands out from the crowd is the Real Application Tool (RAT), but RAT requires an extra license from Oracle. With RAT you can capture real load on a production system and replay it on a test system with new or upgraded hardware or configuration. The reports generated in Enterprise Manager lets you compare the performance of old and new system so clearly that even your CEO can understand it. You cannot get much closer to reality than with RAT, but as said, it costs money and time to set it up. [And for a change I will not whine much more on the fact that SMBs around here do not invest in these licensed packages that makes the life of the DBA easier.]
Back to the question on reliability of these tools; the fact that the load is often quite artificial and likely to be very homogenous (or randomized without the skew you have in your real world) means that they give an upper limit you are not likely to reach in normal production. That was one of the reasons why Kevin Closson wrote the Silly Little Oracle Benchmark (SLOB). After having played with this I have reached the preliminary conclusion that these tools are useful for comparing configurations and behavior more than ranking your system.
The last days my mind has started to compare this with the infamous buffer cache hit ratio (CBHR) - see this post over the subject on Jonathan Lewis' Blog. Though the BCHR is quite a meaningless measurement you still may know what it use to be for your most important databases since there are several monitoring systems out there that still come with an Oracle module that measure it and sends you an alert if it is too low (too low according to the assumptions Lewis and others have attacked long time ago). But what if your BCHR suddenly changes, what does that mean? It means just that: something changed. And that may be worthwhile investigating. Think of your kid that always make some noise, though silence is welcome, if he suddenly stops his rattling, you want to know why, silence is not always a good sign. Translated back to the BCHR, an increased ratio could be caused by some sessions reading the indexes en masse when a FTS would have performed better.
And what is the connection to load testing? By repeating the tests several times and seeing the same pattern again and again the tool confirmed itself and when running it against another system this pattern changed, but remained consistent on the new system. The measured limit itself is probably off the reality, but I think it is reasonable to assume that when the performance becomes erratic and unstable (that means widely varying results that are not linear or exponential linked to the increasing load, or less fancy, the graph you are looking at goes up and down on the right side) you have found some upper limit of the load the SAN can take. If the performance on the new SAN remains stable for more load compared to the old one, it will probably perform better in real world production also, just don't believe you've found the correct upper limit for your application. That was pretty much our ambitions, compare old and new SAN, and possibly discover any obvious configuration errors.
In my case I wanted to do some measurements of I/O performance so I could compare the old EVA with the new 3PAR SAN. First out was ORION (Oracle IO Numbers). ORION is included with the database software, you'll find it under $ORACLE_HOME/bin. Or you can download it from page I just linked to. The general idea with ORION is that it executes more or less the same code path as the Oracle database does when doing I/O. There is no application testing involved, just a bunch of I/O operations thrown against your storage. You can test small and large I/O to simulate the load from OLTP, DSS, or mixed environments. ORION is supported on several platforms including Windows. ORION reports latency (in microseconds), throughput (megabytes pr second), and I/O operations per second (IOPS). The results are written to text files in CSV format that can easily be imported to Excel for easy visualizations (or much better, imported to your precious database and devoured with Apex).
I tried to have SLOB work from Windows, but couldn't find an easy way to replace the semaphore mechanism Closson used on Unix to start all sessions at once. I could have written some in Java or other, but I really didn't have the time and it seemed to be an over-kill.
Next out was Hammerora. It uses Tcl/Tk to run and is also supported on various platforms, but it was very unstable on the Windows 7 32-bit I was using. It would run for a while and suddenly at a random point (sometimes after many minutes of testing) encounter an error with Tcl/Tk needing a restart. Think this was caused by the combination of the OS and Tcl/Tk.
Really wanting to test both SLOB and Hammerora I installed VirtualBox on the PC and started with Hammerora there. Graphs and numbers coming this week, but in general the results from last week was that ORION indicated stable and consistent better performance on the 3PAR with less erratic pattern. Hammerora actually reported better results from VBox than when running directly from host OS (Windows 7). Depending on how much idle time I'll have the coming weeks I'll try out SLOB, Swingbench and more trying to compare the two SANs.