Tuesday, April 15, 2008

Big/small database block and indexes

There is a lot of myths about big and small database blocks for indexes.
From 9i there is a possibility to create tablespaces with different block size. A lot of people
think that there a solution for speed up a indexes. It was my assumption too, but I decided to
make some research on that. You can find my results and some remarks below.

Test nr 1.

Table definition


There are 89999997 rows in this table,
and column A has 914174 different values.

I've wrote a piece of PL/SQL script which was looking
for random A value and then fetch all rows with this value
in loop. Time of looking for value and fetching all rows for it was
measured by timestamp variable in PL/SQL and then reported
to standard output. This script was executed in 'dry run' mode
2 times after database shutdown and then it was run in measured mode.
This same scenario was performed 12 times - 6 times for 8 kB block and
6 times for 16 kB.


i int;
c int;
v_begin timestamp(9);
v_end timestamp(9);


for w in 1..1000 loop
v_begin := SYSTIMESTAMP;
for r in (select A, B, DUR from test where A = (select lpad(trunc(dbms_random.value(0,1000000)),18,'0') from dual)) loop
end loop;
v_interval := v_end - v_begin;
if (c<>0) then
dbms_output.put_line(to_char(v_interval) || ' ' || c);
end if;
end loop;

Here are results:

So as you can see response time is almost the same and none of block size
is a winner.

When I made some more depth investigation, I found out that main SQL statement
is using index range scan to find rowid of block in table. In raw trace files,
I have found, as expected "db file sequential read" so Oracle was using single block
read to access a index and also single block read was used to access the table.
If database block size if below max IO request size it will be always executed in one step.

Here is example result from tkprof:

8 kB database block

16 kB database block

As you can see both execution of SQL has this same execution plan. But in case of 8 kB block SQL had better response time, then in case of 16 kB. There are one difference in executing both scenario - in 16 kB block there was several "
db file scattered read" events which means that there was multi blocks operations. In raw trace file I have found that each multi blocks operations were only 2 or 3 blocks read.

Another side effects, related to block size change are changes in SQL cost.
CBO is calculating index cost using leaf block number and clustering factor.
In case of 16 kB index tablespace, where will be less leaf blocks but clustering factor could be bigger then in 8 kB index tablespace.

Here are results from my example:

8 kB - cost 106

Leaf Blocks - 411768
Custering Factor - 86883030

16 kB

Leaf Blocks - 211117
Custering Factor - 90356662

There are all my findings. I'm waiting for any comments and remarks.

Update - 11.05.2008

With 4 kB blocks performance is even worse. I will update all the pictures soon.


Jakub Wartak said...

Maybe sick idea, but have you tried measuring with index tablespace at 4kB ?

What OS? What DB version? What FS ?

If it is Linux have you tried:
1) to tune /sys/block/*/queue/{max_sectors_kb, scheduler==deadline, nr_requests, read_ahead_kb}
2) if using ext3 what is block size? Are you enforcing noatime?
3) disablng NCQ if using SATA? (echo 1 > /sys/block/$DEV/device/queue_depth)

Do you have DirectIO turned on?

Pioro said...


No I never tried it on 4 kB, but I'm planning to.
My platform is Linux CentOS 4,
Oracle and ext3.

Is your Linux setup speed up
a big block read time ?

I've performed both test on this same OS settings - only one difference was database block size in tablespace .

Jonathan Lewis said...

I know it's nearly 9 months late, but here are a few thoughts:

According to your figures, there are roughly 100 rows per key value - but the trace files show an average of 21 rows per execution. This suggests a large variation in number of values per key,including lots of sample values where there were no rows. This could lead to cases where the difference in the actual data selected was more important than the difference in block size. If you started your test with a call to dbms_random.seed(0) then you could guarantee that your "random" queries ran through the same sequence of values for the 8KB and 16KB tests.

It is interesting to note in the tkprof outputs that most of the time recorded is against the table block reads (7 or 8 seconds for indexes, an increment of 115 to 120 seconds on the table). Even though your results are inconclusive, they do highlight an important aspect of the "big blocks for indexes" argument - in cases like yours, most of the work is in the table, trying to optimise the index is not the significant target.

The appearance of the db file scattered reads in the 16KB test sounds like Oracle's ability to do pre-fetching (i.e. picking up multiple blocks in anticipation of their need). This could be table blocks acquired by examining the leaf blocks of indexes, or leaf blocks based on the values in branch blocks. The CKPT process controls this run-time feature based on "previous success". It would be quite interesting to analyze how many of your "db file scattered read" requests were against the index and how many against the table. (I would guess that they were all against the table, given the description of the data setup).