Sunday, July 24, 2011

RMAN Database duplication without target connectivity can corrupt target database

Latest version of Oracle 11g R2 has new Recovery Manager functionality - database duplication from backup without connectivity to target database. During tests I have found unexpected behavior of that functionality. I'm still not sure if it is bug or RMAN is doing it by design but if backup doesn't contain all files and SKIP TABLESPACE is not included, duplication process can overwrite target database files. When database is using ASM it can happen regardless of database state. Due to file locking mechanism this issue occur on file system only when target database is down.

All has been performed using Oracle version 11.2.0.2 on Red Hat 5.3 running target and auxiliary database on one server using file system. I have tested it with ASM as well but log files are gone together with my VM machine.

Target database is down, Backup has been taken without one tablespace, 
no SKIP TABLESPACE clause.

Duplication script:
run {
set newname for tempefile 1 to '/opt/app/oracle/oradata/clon/temp01.dbf';
duplicate target database to clon
SPFILE
    PARAMETER_VALUE_CONVERT '/opt/app/oracle/admin/testdb/',
                            '/opt/app/oracle/admin/clon/'
    SET SGA_MAX_SIZE '500M'
    SET CONTROL_FILES '/opt/app/oracle/oradata/clon/control01.ctl','/opt/app/oracle/oradata/clon/control02.ctl'
    SET SGA_TARGET '450M'
    SET CLUSTER_DATABASE 'false'
    SET db_create_file_dest          '/opt/app/oracle/oradata/clon/'
    SET db_create_online_log_dest_1  '/opt/app/oracle/oradata/clon/'
    SET LOG_FILE_NAME_CONVERT  '/opt/app/oracle/oradata/testdb/','/opt/app/oracle/oradata/clon/'
    SET DB_FILE_NAME_CONVERT   '/opt/app/oracle/oradata/testdb/','/opt/app/oracle/oradata/clon/'
    BACKUP LOCATION '/opt/app/oracle/backup/';
}
Let's take a look on output - unimportant lines have been skipped
Recovery Manager: Release 11.2.0.2.0 - Production on Tue Jun 21 07:17:34 2011
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
connected to auxiliary database: CLON (not mounted)

Starting Duplicate Db at 21-JUN-11
...
Starting restore at 21-JUN-11
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=171 device type=DISK

channel ORA_AUX_DISK_1: restoring control file
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:03
output file name=/opt/app/oracle/oradata/clon/control01.ctl
output file name=/opt/app/oracle/oradata/clon/control02.ctl
Finished restore at 21-JUN-11

database mounted
released channel: ORA_AUX_DISK_1
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=171 device type=DISK

contents of Memory Script:
{
   set until scn  95159515;
   set newname for datafile  1 to 
 "/opt/app/oracle/oradata/clon/system-01.dbf";
   set newname for datafile  2 to 
 "/opt/app/oracle/oradata/clon/undo_t1-01.dbf";
   set newname for datafile  3 to 
 "/opt/app/oracle/oradata/clon/sysaux-01.dbf";
   set newname for datafile  4 to 
 "/opt/app/oracle/oradata/clon/administrator-01.dbf";
   set newname for datafile  5 to 
 "/opt/app/oracle/oradata/clon/administrator_idx-01.dbf";
   set newname for datafile  6 to 
 "/opt/app/oracle/oradata/clon/replication-01.dbf";
   set newname for datafile  7 to 
 "/opt/app/oracle/oradata/clon/replication_idx-01.dbf";
   set newname for datafile  8 to 
 "/opt/app/oracle/oradata/clon/soe.dbf";
   set newname for datafile  9 to 
 "/opt/app/oracle/oradata/clon/soeindex.dbf";
   set newname for datafile  10 to 
 "/opt/app/oracle/oradata/clon/users-01.dbf";
   restore
   clone database
   ;
}
executing Memory Script
executing command: SET until clause
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
So far so good - new name has been set for all files. What is next
file 9 is excluded from whole database backup
channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00001 to /opt/app/oracle/oradata/clon/system-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00002 to /opt/app/oracle/oradata/clon/undo_t1-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00003 to /opt/app/oracle/oradata/clon/sysaux-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00004 to /opt/app/oracle/oradata/clon/administrator-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00005 to /opt/app/oracle/oradata/clon/administrator_idx-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00006 to /opt/app/oracle/oradata/clon/replication-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00007 to /opt/app/oracle/oradata/clon/replication_idx-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00008 to /opt/app/oracle/oradata/clon/soe.dbf
channel ORA_AUX_DISK_1: restoring datafile 00010 to /opt/app/oracle/oradata/clon/users-01.dbf
channel ORA_AUX_DISK_1: reading from backup piece /opt/app/oracle/backup/bck_13_testdb_1
channel ORA_AUX_DISK_1: piece handle=/opt/app/oracle/backup/bck_13_testdb_1 tag=TAG20110621T064346
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:06:36
failover to previous backup

Finished restore at 21-JUN-11
All files from backupset have been restored and Oracle realize that one file is missing. Keep in mind that new name has been prepared for all files.
contents of Memory Script:
{
   switch clone datafile all;
}
executing Memory Script

datafile 1 switched to datafile copy
input datafile copy RECID=10 STAMP=754384699 file name=/opt/app/oracle/oradata/clon/system-01.dbf
datafile 2 switched to datafile copy
input datafile copy RECID=11 STAMP=754384699 file name=/opt/app/oracle/oradata/clon/undo_t1-01.dbf
datafile 3 switched to datafile copy
input datafile copy RECID=12 STAMP=754384699 file name=/opt/app/oracle/oradata/clon/sysaux-01.dbf
datafile 4 switched to datafile copy
input datafile copy RECID=13 STAMP=754384699 file name=/opt/app/oracle/oradata/clon/administrator-01.dbf
datafile 5 switched to datafile copy
input datafile copy RECID=14 STAMP=754384699 file name=/opt/app/oracle/oradata/clon/administrator_idx-01.dbf
datafile 6 switched to datafile copy
input datafile copy RECID=15 STAMP=754384699 file name=/opt/app/oracle/oradata/clon/replication-01.dbf
datafile 7 switched to datafile copy
input datafile copy RECID=16 STAMP=754384699 file name=/opt/app/oracle/oradata/clon/replication_idx-01.dbf
datafile 8 switched to datafile copy
input datafile copy RECID=17 STAMP=754384699 file name=/opt/app/oracle/oradata/clon/soe.dbf
datafile 10 switched to datafile copy
input datafile copy RECID=18 STAMP=754384700 file name=/opt/app/oracle/oradata/clon/users-01.dbf
This switch command is starting this issue - missing file hasn't been switch to new location
contents of Memory Script:
{
   set until scn  95159515;
   recover
   clone database
   delete archivelog
   ;
}
executing Memory Script
executing command: SET until clause
Starting recover at 21-JUN-11
using channel ORA_AUX_DISK_1
starting media recovery
channel ORA_AUX_DISK_1: starting archived log restore to default destination
channel ORA_AUX_DISK_1: restoring archived log
archived log thread=1 sequence=1024
channel ORA_AUX_DISK_1: restoring archived log
archived log thread=1 sequence=1025
channel ORA_AUX_DISK_1: reading from backup piece /opt/app/oracle/backup/bck_arch_16_testdb_1
channel ORA_AUX_DISK_1: piece handle=/opt/app/oracle/backup/bck_arch_16_testdb_1 tag=TAG20110621T070623
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:03
archived log file name=/opt/app/oracle/fra/CLON/archivelog/2011_06_21/o1_mf_1_1024_700kfxcx_.arc thread=1 sequence=1024
channel clone_default: deleting archived log(s)
archived log file name=/opt/app/oracle/fra/CLON/archivelog/2011_06_21/o1_mf_1_1024_700kfxcx_.arc RECID=2 STAMP=754384704
archived log file name=/opt/app/oracle/fra/CLON/archivelog/2011_06_21/o1_mf_1_1025_700kfxg6_.arc thread=1 sequence=1025
Oracle Error: 
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01152: file 9 was not restored from a sufficiently old backup 
ORA-01110: data file 9: '/opt/app/oracle/oradata/testdb/soeindex.dbf'
The worst part is here - RMAN was trying to apply logs to other database using entries from control file - the TARGET database entries. DBID at that time for target and clone database are still the same and target database was down so there are no locks. Nothing could prevent this process to overwrite target database file header.

What happen to target database ? The following error will be found in target alert.log file

Read of datafile '/opt/app/oracle/oradata/testdb/soeindex.dbf' (fno 9) header failed with ORA-01206
Rereading datafile 9 header failed with ORA-01206
Errors in file /opt/app/oracle/diag/rdbms/testdb/testdb/trace/testdb_dbw0_4984.trc:
ORA-01186: file 9 failed verification tests
ORA-01122: database file 9 failed verification check
File in target database has been corrupted during duplication process and it has to be restored from backup or dropped.

Target database is running, backup has been taken without one table space, 
no SKIP TABLESPACE clause.
Duplication process is starting in similar way to previous scenario but instead of overwriting target file the following error is reporting during recovery process.
RMAN-11001: Oracle Error: 
ORA-00283: recovery session canceled due to errors
ORA-01110: data file 9: '/opt/app/oracle/oradata/testdb/soeindex.dbf'
ORA-01157: cannot identify/lock data file 9 - see DBWR trace file
ORA-01110: data file 9: '/opt/app/oracle/oradata/testdb/soeindex.dbf'
This time duplication process failed but target database wasn't touched.

Target database is down, Backup has been taken without one tablespace, 
SKIP TABLESPACE clause has been setup
This time backup still doesn't have all files but SKIP_TABLESPACE has been added to script. Recovery manager is working now without any issues
Recovery Manager: Release 11.2.0.2.0 - Production on Tue Jun 21 09:32:21 2011
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
connected to auxiliary database: CLON (not mounted)
RMAN> run {
2> set newname for tempfile 1 to '/opt/app/oracle/oradata/clon/temp01.dbf';
3>                                                             
4> duplicate target database to clon                           
5> SPFILE                                                      
6>     PARAMETER_VALUE_CONVERT '/opt/app/oracle/admin/testdb/',
7>                             '/opt/app/oracle/admin/clon/'   
8>     SET SGA_MAX_SIZE '500M'                                 
9>     SET CONTROL_FILES '/opt/app/oracle/oradata/clon/control01.ctl','/opt/app/oracle/oradata/clon/control02.ctl'
10>     SET SGA_TARGET '450M'        
11>     SET CLUSTER_DATABASE 'false' 
12>     SET db_create_file_dest          '/opt/app/oracle/oradata/clon/'
13>     SET db_create_online_log_dest_1  '/opt/app/oracle/oradata/clon/'
14>     SET LOG_FILE_NAME_CONVERT  '/opt/app/oracle/oradata/testdb/','/opt/app/oracle/oradata/clon/'
15>     SET DB_FILE_NAME_CONVERT   '/opt/app/oracle/oradata/testdb/','/opt/app/oracle/oradata/clon/'
16>     SKIP TABLESPACE 'SOEINDEX'                       
17>     UNDO TABLESPACE 'UNDO_T1'                                                                   
18>     BACKUP LOCATION '/opt/app/oracle/backup/';                                                  
19> }     
20> 
executing command: SET NEWNAME

Starting Duplicate Db at 21-JUN-11
...
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=171 device type=DISK
Not connected to TARGET or TARGET not open, cannot verify that subset of tablespaces is self-contained
Not connected to TARGET, cannot verify that set of tablespaces being duplicated does not have SYS objects
Datafile 9 skipped by request

contents of Memory Script:
{
   set until scn  95159515;
   set newname for datafile  1 to 
 "/opt/app/oracle/oradata/clon/system-01.dbf";
   set newname for datafile  2 to 
 "/opt/app/oracle/oradata/clon/undo_t1-01.dbf";
   set newname for datafile  3 to 
 "/opt/app/oracle/oradata/clon/sysaux-01.dbf";
   set newname for datafile  4 to 
 "/opt/app/oracle/oradata/clon/administrator-01.dbf";
   set newname for datafile  5 to 
 "/opt/app/oracle/oradata/clon/administrator_idx-01.dbf";
   set newname for datafile  6 to 
 "/opt/app/oracle/oradata/clon/replication-01.dbf";
   set newname for datafile  7 to 
 "/opt/app/oracle/oradata/clon/replication_idx-01.dbf";
   set newname for datafile  8 to 
 "/opt/app/oracle/oradata/clon/soe.dbf";
   set newname for datafile  10 to 
 "/opt/app/oracle/oradata/clon/users-01.dbf";
   restore
   clone database
   skip forever tablespace  "SOEINDEX"   ;
}
executing Memory Script
executing command: SET until clause
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME

Starting restore at 21-JUN-11
using channel ORA_AUX_DISK_1

channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00001 to /opt/app/oracle/oradata/clon/system-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00002 to /opt/app/oracle/oradata/clon/undo_t1-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00003 to /opt/app/oracle/oradata/clon/sysaux-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00004 to /opt/app/oracle/oradata/clon/administrator-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00005 to /opt/app/oracle/oradata/clon/administrator_idx-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00006 to /opt/app/oracle/oradata/clon/replication-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00007 to /opt/app/oracle/oradata/clon/replication_idx-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00008 to /opt/app/oracle/oradata/clon/soe.dbf
channel ORA_AUX_DISK_1: restoring datafile 00010 to /opt/app/oracle/oradata/clon/users-01.dbf
channel ORA_AUX_DISK_1: reading from backup piece /opt/app/oracle/backup/bck_13_testdb_1
channel ORA_AUX_DISK_1: piece handle=/opt/app/oracle/backup/bck_13_testdb_1 tag=TAG20110621T064346
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:06:25
Finished restore at 21-JUN-11

contents of Memory Script:
{
   switch clone datafile all;
}
executing Memory Script

datafile 1 switched to datafile copy
input datafile copy RECID=10 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/system-01.dbf
datafile 2 switched to datafile copy
input datafile copy RECID=11 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/undo_t1-01.dbf
datafile 3 switched to datafile copy
input datafile copy RECID=12 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/sysaux-01.dbf
datafile 4 switched to datafile copy
input datafile copy RECID=13 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/administrator-01.dbf
datafile 5 switched to datafile copy
input datafile copy RECID=14 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/administrator_idx-01.dbf
datafile 6 switched to datafile copy
input datafile copy RECID=15 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/replication-01.dbf
datafile 7 switched to datafile copy
input datafile copy RECID=16 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/replication_idx-01.dbf
datafile 8 switched to datafile copy
input datafile copy RECID=17 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/soe.dbf
datafile 10 switched to datafile copy
input datafile copy RECID=18 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/users-01.dbf

contents of Memory Script:
{
   set until scn  95159515;
   recover
   clone database
   skip forever tablespace  "SOEINDEX"    delete archivelog
   ;
}
executing Memory Script

executing command: SET until clause

Starting recover at 21-JUN-11
using channel ORA_AUX_DISK_1

Executing: alter database datafile 9 offline drop
starting media recovery

channel ORA_AUX_DISK_1: starting archived log restore to default destination
channel ORA_AUX_DISK_1: restoring archived log
archived log thread=1 sequence=1024
channel ORA_AUX_DISK_1: restoring archived log
archived log thread=1 sequence=1025
channel ORA_AUX_DISK_1: reading from backup piece /opt/app/oracle/backup/bck_arch_16_testdb_1
channel ORA_AUX_DISK_1: piece handle=/opt/app/oracle/backup/bck_arch_16_testdb_1 tag=TAG20110621T070623
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:07
archived log file name=/opt/app/oracle/fra/CLON/archivelog/2011_06_21/o1_mf_1_1024_700sp3lj_.arc thread=1 sequence=1024
channel clone_default: deleting archived log(s)
archived log file name=/opt/app/oracle/fra/CLON/archivelog/2011_06_21/o1_mf_1_1024_700sp3lj_.arc RECID=2 STAMP=754393158
archived log file name=/opt/app/oracle/fra/CLON/archivelog/2011_06_21/o1_mf_1_1025_700sp3p1_.arc thread=1 sequence=1025
channel clone_default: deleting archived log(s)
archived log file name=/opt/app/oracle/fra/CLON/archivelog/2011_06_21/o1_mf_1_1025_700sp3p1_.arc RECID=1 STAMP=754393155
media recovery complete, elapsed time: 00:00:04
Finished recover at 21-JUN-11
Oracle instance started
Additional step has been run before recover process - file from skipped table space has been dropped. In that case recovery process should touch it anymore from any location.

This exercise is showing that all new features should be tested in safe environment and in addition to that every operation on production database has to be well tested. I can't blame only Recovery Manager here but it would nice to have more checks in place (ex. if not all files are in backup just cancel duplication after control file has been restored). This is a perfect example of human error (at least someone created that scripts and run it) which can in complex environment. 

regards,
Marcin

0 comments: