Thursday, May 7, 2009

Do we still need DBA ? A short story about Oracle CBO and data. – part 1

What is a Cost Based Optimizer ?
It is a mathematic model which is trying to estimate using some calculation what will be a best execution plan for particular SQL. All the calculations are based on objects (tables and indexes) statistics, instance statistics and some build in rules. Step by step Oracle CBO is calculating a cost of query (join by join, filter by filter) but some very important part is missing from calculations. What is missing from that ?
A data itself. CBO (in normal mode) doesn’t care about data of course there can be some histograms or column selectivity but in general there is no overview of values of rows.

Why I’m writing about that – I have created a 14 database for 14 branches of that same intuition. There is one source system which is replicated to BI databases and in each database a materialized view based on that same query are generated.
For 12 database it was working perfectly well, but for 2 doesn’t at all. During creation or refresh phase a following error appear:

ORA-01652: unable to extend temp segment by 64 in tablespace TEMP

My first try was to increase a TEMP tablespace from 2 to 4 GB. But it didn’t help.
OK – lets resize once again – I have set up a autoextend for that file and maximum file size has been set to 20 GB. Should be enough – that tables has no more then 100 000 rows. But not,
After 15 min still

ORA-01652: unable to extend temp segment by 64 in tablespace TEMP

This query has been joined about 12 tables. Nine from them was really small and have a good filter condition – a number of rows for that 9 join was really small between 20 and 500. But these tables have been joined to 2 big tables which have thousands of rows. A join condition between was really unique so overall results of query was limited to hundreds or thousands of rows.
I started with review of query to find a join which generate a lot of rows which has to be save in temporary space, after a few tries I had one !
Two tables have been joined on both columns – unfortunately it was not a unique join.

In table A it was four pair of values which meet join condition

Col1 Col2 Rows COUNT(*)
6001 3 8811
6001 4 292
6001 2 8246
6001 1 3292

In table B I got that same number of pairs

Col1 Col2 Rows COUNT(*)
6001 3 2700
6001 4 75
6001 2 10495
6001 1 6813

So if we multiple every join condition it is about 130 mln of rows generated from that query.
Of course design was a little bit different – there are more tables and joins which limit number of rows returned to user – but CBO decide to add that joins and filter after that.
So yes – at this stage I can say we still need DBA.

Next investigation in near future.