Thursday, May 7, 2009

Do we still need DBA ? A short story about Oracle CBO and data. – part 1

What is a Cost Based Optimizer ?
It is a mathematic model which is trying to estimate using some calculation what will be a best execution plan for particular SQL. All the calculations are based on objects (tables and indexes) statistics, instance statistics and some build in rules. Step by step Oracle CBO is calculating a cost of query (join by join, filter by filter) but some very important part is missing from calculations. What is missing from that ?
A data itself. CBO (in normal mode) doesn’t care about data of course there can be some histograms or column selectivity but in general there is no overview of values of rows.

Why I’m writing about that – I have created a 14 database for 14 branches of that same intuition. There is one source system which is replicated to BI databases and in each database a materialized view based on that same query are generated.
For 12 database it was working perfectly well, but for 2 doesn’t at all. During creation or refresh phase a following error appear:

ORA-01652: unable to extend temp segment by 64 in tablespace TEMP



My first try was to increase a TEMP tablespace from 2 to 4 GB. But it didn’t help.
OK – lets resize once again – I have set up a autoextend for that file and maximum file size has been set to 20 GB. Should be enough – that tables has no more then 100 000 rows. But not,
After 15 min still

ORA-01652: unable to extend temp segment by 64 in tablespace TEMP

This query has been joined about 12 tables. Nine from them was really small and have a good filter condition – a number of rows for that 9 join was really small between 20 and 500. But these tables have been joined to 2 big tables which have thousands of rows. A join condition between was really unique so overall results of query was limited to hundreds or thousands of rows.
I started with review of query to find a join which generate a lot of rows which has to be save in temporary space, after a few tries I had one !
Two tables have been joined on both columns – unfortunately it was not a unique join.

In table A it was four pair of values which meet join condition

Col1 Col2 Rows COUNT(*)
6001 3 8811
6001 4 292
6001 2 8246
6001 1 3292

In table B I got that same number of pairs

Col1 Col2 Rows COUNT(*)
6001 3 2700
6001 4 75
6001 2 10495
6001 1 6813

So if we multiple every join condition it is about 130 mln of rows generated from that query.
Of course design was a little bit different – there are more tables and joins which limit number of rows returned to user – but CBO decide to add that joins and filter after that.
So yes – at this stage I can say we still need DBA.

Next investigation in near future.

Thursday, April 23, 2009

Mutex: Reloaded

Different Database but again mutex issue:

And again I must say I hate mutex! Need to find a blocking session...

SQL> select p2raw, count(*) from (select * from v$session where event = 'cursor: pin S wait on X') group by p2raw order by 2;
P2RAW COUNT(*)
---------------- ----------
000005D500000000 239

... 000005D5 in HEX -> 1493 in DEC ...
SQL> select sid,serial# from v$session where SID=1493;
SID SERIAL#
---------- ----------
1493 78
Now what? KILL IT!

SQL> alter system kill session '1493, 78'
alter system kill session '1493, 78'
*
ERROR at line 1:
ORA-00031: session marked for kill

Some time has passed and session still exists... Mutex problem persists of course too. So... kill -9 from OS layer...

Bingo!

We have hardcore workaround! Kill -9 ;-)

I hate mutex!

AppPeople called that Big-App is not working fine... Shit!

Short check...

SQL> select event, count(*) from v$session_wait group by event order by 2;

EVENT COUNT(*)
---------------------------------------------------------------- ----------
(...)
jobq slave wait 4
log file sync 20
db file sequential read 21
rdbms ipc message 24
latch: cache buffers chains 72
cursor: pin S 155
SQL*Net message from client 772


... More than 100 sessions waiting on cursor: pin S event! WTF!

... Metalink examination leads to bug 6968152...

(...)
If default "_kks_use_mutex_pin=TRUE"(MUTEX) setting compared to "_kks_use_mutex_pin=FALSE"(NON-MUTEX) setting in the same processing situation,
the high value of the wait event of "cursor:pin S" is shown, at the same time, the CPU usage rate become high load, and executable transactions decreases.
So performance degraded more by MUTEX("_kks_use_mutex_pin=TRUE" setting) than by NON-MUTEX("_kks_use_mutex_pin=FALSE" setting).
(...)


... So I have checked my CPU's...

System: zulu01
Load averages: 15.46, 15.35, 15.08
1301 processes: 1005 sleeping, 295 running, 1 zombie

I hate mutex! Why new amazing stuff always doesn't work! They have said mutex will solve latches problems. Maybe even improve! ;-)

That is theory. Practice is quite different - I need to disable it by _kss_use_mutex_pin=FALSE :-) Shit!

Easy way to find backup controlfile trace...

Yesterday I was wondering how to generate backup control file trace and than easily find appropriate trace file. The answer is very simple. We need to use tracefile_identifier!

SQL> alter session set tracefile_identifier='bcf';
Session altered.

SQL> alter database backup controlfile to trace;
Database altered.

SQL> exit
(...)

oralab01:/home/oracle> cd $ORACLE_HOME/admin/cbd/udump

oralab01:/opt/oracle/admin/mydb/udump> ls -latr *bcf*
-rw-r----- 1 oracle dba 91521 Apr 22 11:09
mydb_ora_7093_bcf.trc

oralab01:/opt/oracle/admin/mydb/udump> more mydb_ora_7093_bcf.trc
(...)
STARTUP NOMOUNT CREATE CONTROLFILE REUSE DATABASE "mydb" NORESETLOGS FORCE LOGGING ARCHIVELOG
MAXLOGFILES 6
MAXLOGMEMBERS 3
(...)

Monday, April 20, 2009

Oracle is buying SUN

Hello,

I just found out - Oracle starts acquiring process with SUN.

http://www.oracle.com/sun/index.html

http://www.sun.com/third-party/global/oracle/index.jsp

http://money.cnn.com/2009/04/20/technology/Oracle_Sun/index.htm?postversion=2009042010


BTW - press release page is down - hosting service shame on you ;)


In my opinion this is first step to provide Oracle as a black hole system ;)
Sorry first one was Oracle with HP - I'm wondering what will happen with that.
Anyway it will be very interesting to see first Oracle hardware ;) maybe with
some optimization on hardware level (cache fusion) ??

regards,
Marcin

Tuesday, April 14, 2009

Discoverer admin API

Hi,

Did you ever use a command line to do something ?
It is nice to have a strong CMD API to create a scripts and automate some work.
As far as I remember Oracle has a very good tools to perform that kind of task.
I was very surprised when I was trying use EULAPI.bat to automate a implementation
of this same EUL for 14 installations. I have create very simple script, I have started it
- hurray - no errors - command complete but ... nothing changed in database.

Two examples :

1.
eulapi.bat -connect eul_own/eul_own@own -grant_privilege -business_area_access "BI%" -wildcard -user USER1

2.
eulapi.bat -connect eul_own/eul_own@own -grant_privilege -user USER1 -business_area_access "BI%" -wildcard


Number 1 is not working, number 2 is working.
What is a difference ? A location of NAMED parameter "user".
Ok, parameters can be order based but why we need to put a name before it ?
and why there was "command successful" at the end of command ?
Another one nice Oracle "feature" :-)

So if you are trying use EULAPI remember that you have put a correct order of parameters.

regards,
Marcin

Saturday, April 4, 2009

CBO issues

Hello,

This another proof that lower cost doesn't mean fastest execution plan.
I had to tune one of Oracle Discoverer report which was based on view defined in database.
My first approach was to add some indexes because there was none. According
to good practices I have been added a primary key and indexes on FK and join columns.

The looks more less like that

select
distinct
s.col1
s.col2
P.col1
P.col2
L.col1
C.col1
H.col1
G.col1
from
tab1 s,
tab2 c,
tab3 P,
tab4 H,
tab5 G,
tab6 B,
tab7 I,
tab8 L
where 1=1
AND G.col1 = 'Y'
AND G.col2 = P.col2
AND G.col2 = H.col2(+)
...
and b.col3 = (select min(b2.col3)
from tab6 b2
where b2.col1 = b.col1
and b2.col2 = b.col2
and b2.col3 = b.col3
and b2.col4 >= s.col4)
and c.col3 = (select max(f.col3)
from tab8 L2
where l2.col1 = L.col1
and l2.col2 = L.col2
and l2.col3 <= L.col3);

So there was two subqueries correlated with main query.
If there were indexes on tab6.col1, tab6.col2 and tab6.col3 for second subquery
and indexes on b1.col1, b1.col2 and b1.col3 for first subquery.
CBO decide that to use bitmap index conversion and bitmap join (using bitmap conversion to rowids and bitmap conversion from rowids) for that part which was cheaper then normal join (unfortunately for CBO only)




There is a original part of execution plan:



Whole cost of query was 8633.
Execution plan was about 3 do 3.5 minutes.

I have had a problem with that conversion before so I decide to disable it using
hidden parameter _b_tree_bitmap_plans. By default it is set to true and
it allow index conversion. I had changed it to false and execution plan had been changed to:


Whole cost had been changed to 9145, but execution time has been reduced to 30 secs.

Parameter change is a very significant change to whole system and I would like to avoid that.
I had performed more investigation using Oracle event 10053 to find out why it is a case.
All conversions had been used a indexes join so I have
decided to drop one index from table tab6 called tab_col2_idx. That was it.
Query was executed faster and hidden parameter have been unchanged.

Right now I have to investigate more deeper why CBO decide to use bitmap conversion and if it is possible to avoid that in different way. But it has to wait until Monday when I will have a access to trace files.

regards,
Marcin