Wednesday, February 4, 2009

RMAN - random errors from years

Hi,


I have been work with RMAN from 8 years and I'm still wondering why some of RMAN errors are taking from /dev/random ;)

Last example:

Environment : Linux 32 bit - Oracle 10g 10.2.04 on ASM

Performed steps:
  1. Drop existing test DB from ASM - using drop database
  2. Copy backup from production server into test server
  3. restore controlfile from new location
  4. mount database
After that I wanted to restore a database. So I have catalog all necessary backup pieces
in controlfile and check it using list backupset command.
There was one backupset with all datafile with correct status. So it is simple let try to restore
DB.

RMAN>restore database;

creating datafile No=1 name=+DATA/oracle/orcl/datafile/o1_mf_system_3n5w1nky_.dbf
released channel: t1
released channel: t2
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 07/04/2008 10:54:29
ORA-01180: can not create datafile 1
ORA-01110: data file 1: '+DATA/dataprd/ORCL/datafile/o1_mf_system_3n5w1nky_.dbf'

Yeah, nice error.
There is some notes on metalink related to duplicated incarnation and corrupted controlfile
(BTW there is a solution to recreate a controlfile from command line before you restore datafiles - it is possible to recreate a controlfile without datafiles ???)

Anyway there was not my case.

A solution is very simple - I have found out that during catalog phase RMAN is scaning existing flash recovery area and I found archive logs in backupset from previous (droped) database and
bacuse there was differect incarnation of that archive log ... incarnation of my new database has been changed too. And now we have a strange behaviour of RMAN.

RMAN> list backupset;

still display a valid backups for that incarnation

RMAN> restore database;

raise error (see above)

Solution:

RMAN> reset database to incarnation xxx;

where xxx is a previous incarnation of database.

I can understand that Oracle could use a backup from previous incarnation in new (but why ?)
but why there is so stuip error about datafile number 1 ?

Is is impossible to display something more useful like there is no backup for that incarnation ?

ps.
All databases have this same DBID - there are clones
I know there is a bad idea to keep one DBID for many databases but I have thought that with RMAN catalog there is no issue.

regards,
Marcin

Tuesday, January 27, 2009

Graceful switchover in standard edition

I have implemented a lot of standby databases in Oracle Standard Edition version. Until now anyone of our customer asked for Graceful switchover, but at least it happen.



I spend a few hours thinking if it is possible and when a draft of solution came into my mind I have made a research on Oracle Metalink and I have found article dated 1999 about Graceful Switchover in Oracle 8 and 8i – it is interesting because it was before DataGuard and this functionality had been establish (Metalink Doc ID: 76450.1 Graceful Switchover and Switchback of Oracle Standby Databases).



Findings and my original idea are very close – to switchover a database it is required to copy an online redo logs and control file. Everything looks straight forward when we have a database file, online redo logs and control files on filesystem. But what in case of database placed on ASM? Is it possible? In ASM there is no possibility to copy redo logs. Yes, it is. This is some kind of workaround and it required a little more work but I was able to perform a switchover between two databases using ASM.



A solution for ASM based databases is using a mirroring feature for online redo logs. A new member of each group has to be placed on filesystem and not on ASM disk group. After that change we are able to perform a graceful switchover using steps described in Oracle document. At the end an additional (temporary) member of each redo group can be deleted.

Wednesday, January 7, 2009

Nice bug

Today I was trying to enable automatic patch update in 11g.
I opened a configuration page and typed my email and password, when I pressed Apply
button I saw a nice information in red:

Invalid Data - Error: apply failed ORA-12899: value too large for column
"SYSMAN"."MGMT_ARU_CREDENTIALS"."ARU_USERNAME" (actual: 80, maximum: 64)
ORA-06512: at "SYSMAN.MGMT_CREDENTIAL", line 1482 ORA-06512: at line 1

On Metalink your account is your email, so all DBA's with long first or surnames have a problem ;)
I have make a manual research and I have found a solution - use shorter account name.

BTW this is solved in 11.1.0.7
From Metalink note 470696.1

The next Release of DB Control (11.1.0.7) will include the fix. The maximum length of the Metalink Username will be 255 characters (as Metalink username can have up to 255 characters).

I will try that.

update:
Even in 11.1.0.7 on Linux I cound not use my metalink account.

Wednesday, July 16, 2008

RAC on Teide


During my last vacation I was asked to make some phone consultation
for Oracle 10g RAC database creation. As far it was "highest" Oracle instalation
because I was almost on the top of Pico de Teide - about 3550 m ;)
See my picture taken by my wife.

Thursday, June 12, 2008

Gathering statistics

Today I just find out a very nice and useful article about gathering statistic in 10g.
Check it out http://structureddata.org/2008/03/26/choosing-an-optimal-stats-gathering-strategy/

Monday, June 2, 2008

DataGuard and ORA-00600

Last week I was asked to configure DataGuard on Oracle 10.2.0.3 64 Bit on RedHat.
Everything was OK, but I forgot to set different "db_unique_name" for both production
and DataGuard databases. I successfully start DataGuard in Maximum performance mode, but when I was trying to start it in more secure mode I bumped into ORA-00600 error - ORA-00600: internal error code, arguments: [krslagap.1], [], [], [], [], [], [], []

I've spend almost one day trying to fix find out why this error appear between mount and open mode.
Finally I rechecked everything I've found a root cause.
So just for your information, always double check all parameters.
BTW - it is better way to inform user about missing or wrong parameter then ORA-00600

Tuesday, April 15, 2008

Big/small database block and indexes

There is a lot of myths about big and small database blocks for indexes.
From 9i there is a possibility to create tablespaces with different block size. A lot of people
think that there a solution for speed up a indexes. It was my assumption too, but I decided to
make some research on that. You can find my results and some remarks below.

Test nr 1.

Table definition

CREATE TABLE TEST1
(
A VARCHAR2(18 CHAR),
B VARCHAR2(18 CHAR),
B1 VARCHAR2(15 CHAR),
B2 VARCHAR2(16 CHAR),
B3 VARCHAR2(8 CHAR),
C1 DATE,
C2 DATE,
DUR VARCHAR2(6 CHAR),
E VARCHAR2(3 CHAR),
I VARCHAR2(40 CHAR)
)


There are 89999997 rows in this table,
and column A has 914174 different values.

I've wrote a piece of PL/SQL script which was looking
for random A value and then fetch all rows with this value
in loop. Time of looking for value and fetching all rows for it was
measured by timestamp variable in PL/SQL and then reported
to standard output. This script was executed in 'dry run' mode
2 times after database shutdown and then it was run in measured mode.
This same scenario was performed 12 times - 6 times for 8 kB block and
6 times for 16 kB.

declare

i int;
c int;
v_begin timestamp(9);
v_end timestamp(9);
v_interval INTERVAL DAY TO SECOND;


begin
i:=0;

for w in 1..1000 loop
v_begin := SYSTIMESTAMP;
c:=0;
for r in (select A, B, DUR from test where A = (select lpad(trunc(dbms_random.value(0,1000000)),18,'0') from dual)) loop
i:=r.duration;
c:=c+1;
end loop;
v_end := SYSTIMESTAMP;
v_interval := v_end - v_begin;
if (c<>0) then
dbms_output.put_line(to_char(v_interval) || ' ' || c);
end if;
end loop;
end;
/


Here are results:




So as you can see response time is almost the same and none of block size
is a winner.

When I made some more depth investigation, I found out that main SQL statement
is using index range scan to find rowid of block in table. In raw trace files,
I have found, as expected "db file sequential read" so Oracle was using single block
read to access a index and also single block read was used to access the table.
If database block size if below max IO request size it will be always executed in one step.

Here is example result from tkprof:

8 kB database block



16 kB database block




As you can see both execution of SQL has this same execution plan. But in case of 8 kB block SQL had better response time, then in case of 16 kB. There are one difference in executing both scenario - in 16 kB block there was several "
db file scattered read" events which means that there was multi blocks operations. In raw trace file I have found that each multi blocks operations were only 2 or 3 blocks read.

Another side effects, related to block size change are changes in SQL cost.
CBO is calculating index cost using leaf block number and clustering factor.
In case of 16 kB index tablespace, where will be less leaf blocks but clustering factor could be bigger then in 8 kB index tablespace.

Here are results from my example:

8 kB - cost 106

Leaf Blocks - 411768
Custering Factor - 86883030

16 kB

Leaf Blocks - 211117
Custering Factor - 90356662


There are all my findings. I'm waiting for any comments and remarks.


Update - 11.05.2008

With 4 kB blocks performance is even worse. I will update all the pictures soon.