Thursday, August 16, 2007

Order by - sort versus index full scan

High water mark is a real problem for some tables in database.

In my case I have a table which is a part of interface between two systems and once a week this table is filled with a lot of rows (about 1 000 000). When rows are processed, interface service remove them from the table. But there is a other service which check how many rows is in table and run two statement:

  • select count(*) from interface_table
  • select col1, col2, col3 from interface table order by col1,col2
with out the index both query use full scan table access so due to high water mark this statement takes a lot of time.

First statement is quite easy to tune is enough to add
index but second one is more complicated. One can smile here and
tell "We need to add one hint. that's all." Of course, we can if we have a access to source code.
In this case we have not.

Only way to change table access in second case is use a "order by" clause and knowledge how this clause is executing by Oracle.In Oracle database are two possibilities to execute "orderby" clause. One is use "SORT", second one is use "INDEX FULL SCAN". Oracle can use second option if table has primary key and columns in primary key are in this same order as columns in "ORDER BY" clause. Next restriction is that columns can not be varchar2.
Please find below examples of problem:

Table definition:

CREATE TABLE marcin_test
(
ROWSTAMP number(16),
INTERNAME VARCHAR2(50 BYTE),
TRANSID NUMBER(16),
TRANSORDER NUMBER(16),
ERRORCOUNT INTEGER,
ERRORFILE VARCHAR2(50 BYTE),
EMAILSENT VARCHAR2(10 BYTE)
)

begin
for i in 11..50000 loop
insert into marcin_test values (i,'aaaa',i,i,1,'a','a');
end loop;
commit;
end;
/


Testing SQL statement:

select intername, transid, transorder, errorfile
from marcin_test
order by transid, transorder;

Results:

  • no primary key – table with 50000 rows
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=523 Card=50000 Bytes=1000000)
1 0 SORT (ORDER BY) (Cost=523 Card=50000 Bytes=1000000)
2 1 TABLE ACCESS (FULL) OF 'MARCIN_TEST' (Cost=95 Card=50000 Bytes=1000000)

  • no primary key – table with 100 rows – HWM was set by 50000 rows
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=108 Card=10 Bytes=160)
1 0 SORT (ORDER BY) (Cost=108 Card=10 Bytes=160)
2 1 TABLE ACCESS (FULL) OF 'MARCIN_TEST' (Cost=95 Card=10 Bytes=160)

  • primary key – table with 50000 rows
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=523 Card=50000 Bytes=1000000)
1 0 SORT (ORDER BY) (Cost=523 Card=50000 Bytes=1000000)
2 1 TABLE ACCESS (FULL) OF 'MARCIN_TEST' (Cost=95 Card=50000 Bytes=1000000)

  • primary key – table with 100 rows – HWM was set by 50000 rows
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=10 Bytes=160)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'MARCIN_TEST' (Cost=3 Card=10 Bytes=160)
2 1 INDEX (FULL SCAN) OF 'T1_PK_2' (UNIQUE) (Cost=2 Card=10)

This examples shows that if we have a “ORDER BY” clause and if we can add primary key to table, we can avoid full scan on table with very high HWM and reduce response time for this
query.



0 comments: