Monday, June 30, 2008

Table partitioning to improve stats

In a earlier post about legacy databases I mentioned that dirty data are more likely to enter the database on an earlier stage. One day while I was looking at the selectivity for a few columns I found the values a bit strange. After some digging I noticed that for most of the time the data was OK, even though a proper foreign key constraint was missing for the columns. But at some time many years ago somebody had inserted something completely different, which had clearly affected the selectivity (the columns are indexed).

Table partitioning is better to assist in the management of the data more than for tuning purposes, and I imagined that this would be another example. By range partitioning on transaction time dirty data would be kept on a partition separate from more recent history and thus improve the column stats. Though I haven't set up a test case (for the lack of a real problem) I believe that a correct selectivity will help the CBO and avoid these border problems where the CBO is tricked to throw off a wrong plan based on wrong statistics. That would be another example where the CBO is blamed for something that is not its fault.

This is why I don't like hints, it feels like cheating and being lazy. The CBO may have bugs, but I think you have to be as good as the author of this book to prove it; usually there is an underlying problem, like the one I encountered.

Anyway, I did a quick test, by copying the table in question into a partioned version and ran dbms_stats.gather_table_stats on it. In fact the selectivity did not change much between partitions, probably not enough to make an impact on CBO choice of plan. But I had used other parameters on the new table (most importantly I think, estimate_percent=>100), the columns in the two tables had very different values for selectivity. Which reiterated to me previous knowledge; checking the stats and the way they are collected is important, and sometimes a simple solution will do.