Monday, August 20, 2012

Oracle load testing - part 3 Hammerora results

I learned something very important when doing testing with Hammerora. The documentation is quite good and has a simple, but important point, the importance of planning and preparation. To quote the documentation for Oracle OLTP testing:

Planning and Preparation is one of the most crucial stages of successful testing but is often overlooked. Firstly you should fully document the configuration of your entire load testing environment including details such as hardware, operating system versions and settings and Oracle version and parameters. Once you have fully documented your configuration you should ensure that the configuration is not changed for an entire series of measured tests. This takes discipline but is an essential component of conducting accurate and measured tests.
From this I conclude that many tests I've seen and done myself have not been accurate. As stated earlier the goal for this testing was just to compare performance before and after migration from EVA to 3PAR.

Since this customer did not have the required license to run AWR I did the following change in the driver scipt to create a Statspack snapshot in place of a AWR snapshot. Search for string containing dbms_workload, replace
set sql1 "BEGIN dbms_workload_repository.create_snapshot(); END;"
with
set sql1 "BEGIN perfstat.statspack.snap; END;"

When testing with Hammerora I decided to run each test three times to see if the numbers where consistent. I recorded the numbers of each run in an spread sheet as shown in the following table for the tests on EVA:

Vusers Run Report tpm nopm Avg_tpm
1 1 1_2 7158 2369 7634
1 2 11_12 7874 2645
1 3 21_22 7868 2804
3 1 31_32 16478 5765 17317
3 2 41_42 17678 6256
3 3 43_44 17794 6130
10 1 45_46 27847 9959 33225
10 2 51_61 32581 11600
10 3 71_81 39248 13701
20 1 91_101 47489 17441 47075
20 2 111_121 63062 22658
20 3 131_141 30674 11116
30 1 151_161 54349 19756 44186
30 2 171_181 45628 17331
30 3 191_201 32581 12733

Vusers is the number of virtual users in Hammerora, Run is 1 - 3 for each new setting of Vusers. Report  refers to the Statspack report created on snapshots before and after. Tpm and nopm as reported from Hammerora and finally Avg_tpm is the average in each group. Compare this to the numbers for the 3PAR:

Vusers Run Report tpm nopm Avg_tpm
1 1 9_10 8246 2815 8262
1 2 11_12 7983 2717
1 3 13_14 8556 2956
3 1 15_16 22652 7854 22881
3 2 17_18 22652 7831
3 3 19_20 23339 7994
10 1 21_22 33539 11767 33191
10 2 25_26 39054 13729
10 3 27_28 26981 9428
20 1 29_30 47134 16462 47356
20 2 31_32 46436 16330
20 3 33_34 48497 17023
30 1 35_36 53197 18902 50788
30 2 37_38 44980 15994
30 3 39_40 54187 19033

The repeated tests for the same number of virtual users do not vary as much on the 3PAR compared to the EVA. Also the numbers for the EVA seemed to improve for each run, maybe due to some caching taking place.

The 3PAR seemed to be more reliable for the same number of virtual users as can be seen in these screen captures, the first for 20 virtual users on EVA:

The peformance on the 3PAR does not change much during the test (20 virtual users):

You'll see that in one instant the EVA seems to perform better, but I rather have stable and less erratic performance with the 3PAR than a system with occasional good performance.

All in all it was very easy to play around with Hammerora, it is very easy to set up so you can spend time on planning and executing the tests. Also I like how you can observe change of performance over time. Clearly Hammerora is a tool I will use more later.

2 comments :

Rock Den said...

The process outlined below is not a method for stress testing your app. It’s not designed to calculate the load that can be applied, instead it's used to see the trend in the Performance Testing of the app.

Oyvind Isene said...

Rock Den, thanks for stopping by. Above, you mean :) Right, "it is not a method for stress testing your app" since there was no app involved besides the testing tool in itself. The expression load testing was used generic and there was no attempt to calculate the load that can be applied, rather look for a pattern and compare two systems.