Sunday, September 18, 2011

How to restore Oracle Grid Infrastructure OCR and vote disk on ASM

After reading the book about Oracle RAC (see previous post) I decided to test recovery of OCR (Oracle Cluster Registry) file and vote disk placed on ASM disk groups. This is an operation I never did before. I was lucky enough and my ASM based RAC installation never had similar issue but I did it before when OCR and vote disk were placed on raw or block devices.
According to Oracle doc it should be straight forward and it really is but you need to careful otherwise you can hit strange errors, like "segmentation fault" for ocrconfig when Grid Infrastructure is down.
Here is step by step instruction:
  1. Check if you can access OCR files
    [root@node1 ~]# /u01/app/11.2.0/grid/bin/ocrcheck
    PROT-602: Failed to retrieve data from the cluster registry
    PROC-26: Error while accessing the physical storage
    
  2. Stop CRS on all nodes
    [root@node1 ~]# export CRS_HOME=/u01/app/11.2.0/grid
    [root@node1 ~]# $CRS_HOME/bin/crsctl stop crs
    CRS-2796: The command may not proceed when Cluster Ready Services is not running
    CRS-4687: Shutdown command has completed with errors.
    CRS-4000: Command Stop failed, or completed with errors.
    
    If you got error like that try force option
    [root@node1 ~]# $CRS_HOME/bin/crsctl stop crs -f
    CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node1'
    CRS-2673: Attempting to stop 'ora.ctssd' on 'node1'
    CRS-2673: Attempting to stop 'ora.evmd' on 'node1'
    CRS-2673: Attempting to stop 'ora.asm' on 'node1'
    CRS-2673: Attempting to stop 'ora.mdnsd' on 'node1'
    CRS-2677: Stop of 'ora.asm' on 'node1' succeeded
    CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'node1'
    CRS-2677: Stop of 'ora.mdnsd' on 'node1' succeeded
    CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'node1' succeeded
    CRS-2677: Stop of 'ora.evmd' on 'node1' succeeded
    CRS-2677: Stop of 'ora.ctssd' on 'node1' succeeded
    CRS-2673: Attempting to stop 'ora.cssd' on 'node1'
    CRS-2677: Stop of 'ora.cssd' on 'node1' succeeded
    CRS-2673: Attempting to stop 'ora.diskmon' on 'node1'
    CRS-2673: Attempting to stop 'ora.crf' on 'node1'
    CRS-2677: Stop of 'ora.crf' on 'node1' succeeded
    CRS-2673: Attempting to stop 'ora.gipcd' on 'node1'
    CRS-2677: Stop of 'ora.diskmon' on 'node1' succeeded
    CRS-2677: Stop of 'ora.gipcd' on 'node1' succeeded
    CRS-2673: Attempting to stop 'ora.gpnpd' on 'node1'
    CRS-2677: Stop of 'ora.gpnpd' on 'node1' succeeded
    CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node1' has completed
    CRS-4133: Oracle High Availability Services has been stopped.
    [root@node1 ~]#
    
  3. When OCR file and vote disk are located on ASM disk group running ASM instance is required for recovery. Current version of Oracle Grid Infrastructure (GI) can be started in exclusive mode on one node to allow checks and restore.
    There is a slight change between version 11.2.0.1 and 11.2.0.2 - in first release we need to start GI in exclusive mode and then stop CRS service (as showed below). In 11.2.0.2 we can start GI without CRS but old steps are still working.
    Steps for 11.2.0.1
    [root@node1 ~]# $CRS_HOME/bin/crsctl start crs -excl
    CRS-4123: Oracle High Availability Services has been started.
    CRS-2672: Attempting to start 'ora.mdnsd' on 'node1'
    CRS-2676: Start of 'ora.mdnsd' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.gpnpd' on 'node1'
    CRS-2676: Start of 'ora.gpnpd' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node1'
    CRS-2672: Attempting to start 'ora.gipcd' on 'node1'
    CRS-2676: Start of 'ora.cssdmonitor' on 'node1' succeeded
    CRS-2676: Start of 'ora.gipcd' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.cssd' on 'node1'
    CRS-2672: Attempting to start 'ora.diskmon' on 'node1'
    CRS-2676: Start of 'ora.diskmon' on 'node1' succeeded
    CRS-2676: Start of 'ora.cssd' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.ctssd' on 'node1'
    CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'node1'
    CRS-2676: Start of 'ora.ctssd' on 'node1' succeeded
    CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error:
    Start action for HAIP aborted
    CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'node1' failed
    CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'node1'
    CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.asm' on 'node1'
    CRS-2674: Start of 'ora.asm' on 'node1' failed
    CRS-2673: Attempting to stop 'ora.ctssd' on 'node1'
    CRS-2677: Stop of 'ora.ctssd' on 'node1' succeeded
    CRS-4000: Command Start failed, or completed with errors.
    [root@node1 ~]#
    
    Errors are expected so now we need to stop CRS (if it has been started)
    [root@node1 ~]# $CRS_HOME/bin/crsctl stop resource ora.crsd -init
    [root@node1 ~]# $CRS_HOME/bin/crsctl stop resource ora.crsd -init
    CRS-2500: Cannot stop resource 'ora.crsd' as it is not running
    CRS-4000: Command Stop failed, or completed with errors.
    
    Steps for 11.2.0.2 - this allow us to avoid starting CRS which is trying to start resources. Issues related to CRS resource will bring down HAIP and ASM - and this is root cause of errors shown in 11.2.0.1 example.
    [root@node1 ~]# $CRS_HOME/bin/crsctl start crs -excl -nocrs
    CRS-4123: Oracle High Availability Services has been started.
    CRS-2672: Attempting to start 'ora.mdnsd' on 'node1'
    CRS-2676: Start of 'ora.mdnsd' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.gpnpd' on 'node1'
    CRS-2676: Start of 'ora.gpnpd' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node1'
    CRS-2672: Attempting to start 'ora.gipcd' on 'node1'
    CRS-2676: Start of 'ora.cssdmonitor' on 'node1' succeeded
    CRS-2676: Start of 'ora.gipcd' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.cssd' on 'node1'
    CRS-2672: Attempting to start 'ora.diskmon' on 'node1'
    CRS-2676: Start of 'ora.diskmon' on 'node1' succeeded
    CRS-2676: Start of 'ora.cssd' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'node1'
    CRS-2672: Attempting to start 'ora.ctssd' on 'node1'
    CRS-2676: Start of 'ora.ctssd' on 'node1' succeeded
    CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.asm' on 'node1'
    CRS-2676: Start of 'ora.asm' on 'node1' succeeded
    
    No errors has been displayed and GI has been started in exclusive mode.
  4. Now it is time to check if ASM instance is running and recreate ASM disk groups.
    [oracle@node1 ~]$ ps aux | grep pmon
    oracle    8009  0.0  0.9 495804 18564 ?        Ss   20:37   0:00 asm_pmon_+ASM1
    oracle    8260  0.0  0.0  61184   760 pts/3    S+   20:38   0:00 grep pmon
    
    ASM is running. Now we can connect and create a disk group for OCR and vote disk. Be sure that your new group will have proper COMPATIBLE.ASM value. You can put OCR and vote on groups with COMPATIBLE.ASM greater than 11.2.
    [oracle@node1 ~]$ export ORACLE_HOME=/u01/app/11.2.0/grid
    [oracle@node1 ~]$ export PATH=$PATH:$ORACLE_HOME/bin
    [oracle@node1 ~]$ export ORACLE_SID=+ASM1
    [oracle@node1 ~]$ sqlplus / as sysasm
    SQL*Plus: Release 11.2.0.2.0 Production on Tue Aug 30 20:39:14 2011
    Copyright (c) 1982, 2010, Oracle.  All rights reserved.
    Connected to:
    Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
    With the Real Application Clusters and Automatic Storage Management options
    SQL> show parameter asm_di
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    asm_diskgroups                       string
    asm_diskstring                       string
    
    SQL> alter system set asm_diskstring = '/dev/asmdisk*';
    System altered.
    
    SQL> create diskgroup DATA external redundancy disk '/dev/asmdisk11' attribute 'COMPATIBLE.ASM' = '11.2';
    Diskgroup created.
    
  5. In next step we need to create spfile for ASM instance as it will be used for rest of cluster to find out location of ASM disks using 'asm_diskstring'.
    First text init.ora file has been created and then new binary spfile has been created on ASM disk group.
    [oracle@node1 rac-cluster]$ cat /tmp/init.ora
    *.asm_power_limit=1
    *.diagnostic_dest='/u01/app/oracle'
    *.instance_type='asm'
    *.large_pool_size=12M
    *.remote_login_passwordfile='EXCLUSIVE'
    *.asm_diskstring = '/dev/asmdisk*'
    [oracle@node1 rac-cluster]$ sqlplus / as sysasm
    SQL*Plus: Release 11.2.0.2.0 Production on Tue Aug 30 21:23:36 2011
    Copyright (c) 1982, 2010, Oracle.  All rights reserved.
    Connected to:
    Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
    With the Real Application Clusters and Automatic Storage Management options
    
    SQL> create spfile='+DATA' from pfile='/tmp/init.ora';
    File created.
    
  6. When disk group is ready it is time now to restore OCR from backup file.
    Oracle Grid Infrastructure is doing OCR backup automatically into following location $GRID_HOME/cdata/. Let's restore and check if OCR file is recognized properly after it.
    [root@node1 ~]# $CRS_HOME/bin/ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-cluster/backup_20110830_201118.ocr
    
    [root@node1 ~]# $CRS_HOME/bin/ocrcheck
    Status of Oracle Cluster Registry is as follows :
             Version                  :          3
             Total space (kbytes)     :     262120
             Used space (kbytes)      :       3132
             Available space (kbytes) :     258988
             ID                       :  660203047
             Device/File Name         :      +DATA
                                        Device/File integrity check succeeded
                                        Device/File not configured
                                        Device/File not configured
                                        Device/File not configured
                                        Device/File not configured
             Cluster registry integrity check succeeded
             Logical corruption check succeeded
    
  7. Now it is time to restore vote disk. This process will read asm_diskstring from ASM instance and will place vote files on these disks. See what happen when asm_diskstring is empty.
    [root@node1 ~]# $CRS_HOME/bin/crsctl replace votedisk +DATA
    CRS-4602: Failed 27 to add voting file 28652f742fc44f28bfc6d12d1412a604.
    Failed to replace voting disk group with +DATA.
    CRS-4000: Command Replace failed, or completed with errors.
    
    Error message from log file.
    [cssd(7894)]CRS-1638:Unable to locate voting file with ID 1b37b25b-686c4fb4-bfb82eac-357f48df that is being added to the list of configured voting files; details at (:CSSNM00022:) in /u01/app/11.2.0/grid/log/node1/cssd/ocssd.log
    2011-08-30 20:44:12.256
    
    When asm_diskstring is set up properly it is looking much better.
    [root@node1 ~]# $CRS_HOME/bin/crsctl replace votedisk +DATA
    Successful addition of voting disk 4ca8c2b58d394ff1bf7a9b88dd9f5fc3.
    Successfully replaced voting disk group with +DATA.
    CRS-4266: Voting file(s) successfully replaced
    [root@node1 ~]#  $CRS_HOME/bin/crsctl query css votedisk
    ##  STATE    File Universal Id                File Name Disk group
    --  -----    -----------------                --------- ---------
     1. ONLINE   4ca8c2b58d394ff1bf7a9b88dd9f5fc3 (/dev/asmdisk11) [DATA]
    Located 1 voting disk(s).
    
  8. In last step Grid Infrastructure need to be restarted on all nodes.
    [root@node1 ~]#  $CRS_HOME/bin/crsctl stop crs -f
    CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node1'
    CRS-2673: Attempting to stop 'ora.mdnsd' on 'node1'
    CRS-2673: Attempting to stop 'ora.ctssd' on 'node1'
    CRS-2673: Attempting to stop 'ora.asm' on 'node1'
    CRS-2677: Stop of 'ora.asm' on 'node1' succeeded
    CRS-2677: Stop of 'ora.mdnsd' on 'node1' succeeded
    CRS-2677: Stop of 'ora.ctssd' on 'node1' succeeded
    CRS-2673: Attempting to stop 'ora.cssd' on 'node1'
    CRS-2677: Stop of 'ora.cssd' on 'node1' succeeded
    CRS-2673: Attempting to stop 'ora.gipcd' on 'node1'
    CRS-2673: Attempting to stop 'ora.diskmon' on 'node1'
    CRS-2677: Stop of 'ora.diskmon' on 'node1' succeeded
    CRS-2677: Stop of 'ora.gipcd' on 'node1' succeeded
    CRS-2673: Attempting to stop 'ora.gpnpd' on 'node1'
    CRS-2677: Stop of 'ora.gpnpd' on 'node1' succeeded
    CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node1' has completed
    CRS-4133: Oracle High Availability Services has been stopped.
    
    [root@node1 ~]#  $CRS_HOME/bin/crsctl start crs
    CRS-4123: Oracle High Availability Services has been started.
    
    If everything went well similar result is expected
    [root@node1 ~]# $CRS_HOME/bin/crsctl check cluster -all
    **************************************************************
    node1:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    node2:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    node3:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    
Now Grid Infrastructure is working and if database ASM disk groups has been destroyed as well or we have everything in one ASM disk groups (like in my example) it is time to restore database from backup using Oracle Recovery Manager. But this is a different story.

Hope it help someone to restore Grid Infrastructure after disk crash.
regards,
Marcin

Monday, August 29, 2011

Book review - Oracle 11g R1/R2 Real Application Clusters Essentials

Recently I have read a new book “Oracle 11g R1/R2 Real Application Clusters Essentials” by Ben Prusinski and Syed Jaffer Hussain.. First of all I would like to thanks both authors for good lecture for few days. Whole book is divided into a few parts:
  • introduction to Oracle Real Application Server including High Availability concepts, various storage options and overall Oracle RAC architecture - chapter 1 and 2
  • installation of Oracle RAC components - Clusterware, Automatic Storage Management - chapter 3
  • overview of Oracle ASM - chapter 4
  • cluster components troubleshooting - chapter 5
  • Oracle Database installation  - chapter 6
  • Day by day topics including administration, backup, performance tuning and upgrade of Oracle Real Application Cluster - chapter 6, 7, 8 and 9
  • Real life examples including implementation of Oracle EBS using Oracle RAC - chapter 10 and 11
  • Oracle Maximum Availability - Chapter 12
  • Overview of Oracle tools for more experience users - Appendinx
In a first part authors are presenting High Availability options for Oracle and describing all main hardware components required for Oracle RAC implementation (server, network and storage options). Next chapter is covering an Oracle RAC architecture and it is covering mostly 11g Release 1. There is some information about a 11g Release 2 as well but due to number of changes between both version I would like to see two chapters instead of one - some additional information about a 11gR2 is covered at the end of appendix.

The next part is a guide how to install whole an Oracle RAC infrastructure - Clusterware and Automatic Storage Management in case of 11g Release 1 or Grid Infrastructure in case of version 11g Release 2. It is well documented and I will help new users to go though this process.

The next chapter is covering all topics related to Oracle Automatic Storage Management. Authors are presenting concepts and overview of this component as well as basic administration options. There is also separate part about new features of Oracle ASM 11g Release 2 covering ACFS file system.   

Chapter 5 is about managing and  troubleshooting Oracle RAC components. Some tools and common scenarios based on typical clusterware issues are presented there together with solving tips.

The next chapters (6,7,8 and 9) describe installation of database as well as new features of an Oracle 11g Release 1 and Release 2. In subsequent paragraph we can read about Oracle RAC High Availability features like - Load Balancing, Transparent Application Failover and Fast Connection Failover. After that backup and recovery strategy for database and whole clusterware comes to play. From management perspective this part is very important and should be study carefully. Next part is a overview of performance tuning tools, wait event and statistics important any performance improvement work.

Real life examples is in my opinion most interesting part of the book. All readers can go though typical scenario like adding or deleting node from cluster or converting database from single instance to RAC database. Configuration of Oracle EBS with Oracle RAC is also very well documented.

When I went though whole book I have to say that it was a good lecture and I can recommend it for everybody who want to do first dive into Oracle RAC world. Furthermore I think that more experience users will find this book interesting and worth to read too.

regards,
Marcin

Wednesday, August 24, 2011

Oracle VM 3.0

Announced yesterday Oracle VM 3.0 can be downloaded from edelivery.oracle.com.
I have downloaded Oracle VM server - 177 M and Oracle VM Manager - 2.4 GB. As far as I remember Oracle VM Manager is based on Oracle Database and Weblogic stack and this explain a little bit size of iso image with VM manager.
Documentation can found here -Oracle VM 3.0 documentation.
ps. Going to install it now

regards,
Marcin

Saturday, August 6, 2011

PL/SQL myths busted

RSS feed from Oakie's blog point me to Toon's Koppelaars post and his redirection to Morten Braten blog. Morten wrote excellent post about stored PL/SQL procedures - Mythbusters: Stored Procedures Edition.

regards,
Marcin

Thursday, July 28, 2011

How to find SQL_ID and PLAN_HASH_VALUE in Oracle SQL Plan Management Baselines

SQL_ID and/or PLAN_HASH_VALUE are used almost globally across Oracle database to identify query. This is not a case for SPM mechanism
using different identifiers - SQL_HANDLE for query and PLAN_NAME for execution plan. SQL_HANDLE contain hexadecimal representation of EXACT_MATCHING_SIGNATURE from V$SQL but we don't have any information in DBA_SQL_PLAN_BASELINES about SQL_ID and PLAN_HASH_VALUE.
After some research I found two ways to find SQL_ID matched to SQL baseline (if there is more please let me know).
Let me present following example - first of all test environment will be prepared
SQL> select sql_handle, plan_name from dba_sql_plan_baselines;

no rows selected

SQL> select count(*) from soe.customers where customer_id = 19998;

  COUNT(*)
----------
         1

SQL> select sql_id, plan_hash_value, exact_matching_signature, sql_plan_baseline from v$sql where sql_text = 'select count(*) from soe.customers where customer_id = 19998';

SQL_ID                       PLAN_HASH_VALUE       EXACT_MATCHING_SIGNATURE SQL_PLAN_BASELINE
------------- ------------------------------ ------------------------------ ------------------------------
aa1m3a9b64srz                     1925551782           18089075686129158449
There is no plan baseline at the beginning to make it more clear and only baseline for test query will be loaded.
SQL> vari rc number
SQL> exec :rc:=dbms_spm.load_plans_from_cursor_cache(sql_id=>'aa1m3a9b64srz',plan_hash_value=>1925551782)

PL/SQL procedure successfully completed.

SQL> print rc

                            RC
------------------------------
                             1

SQL> select sql_id, plan_hash_value, exact_matching_signature, sql_plan_baseline from v$sql where sql_text = 'select count(*) from soe.customers where customer_id = 19998';

SQL_ID                       PLAN_HASH_VALUE       EXACT_MATCHING_SIGNATURE SQL_PLAN_BASELINE
------------- ------------------------------ ------------------------------ ------------------------------
aa1m3a9b64srz                     1925551782           18089075686129158449

SQL> select count(*) from soe.customers where customer_id = 19998;

                      COUNT(*)
------------------------------
                             1

SQL> select sql_id, plan_hash_value, exact_matching_signature, sql_plan_baseline from v$sql where sql_text = 'select count(*) from soe.customers where customer_id = 19998';

SQL_ID                       PLAN_HASH_VALUE       EXACT_MATCHING_SIGNATURE SQL_PLAN_BASELINE
------------- ------------------------------ ------------------------------ ------------------------------
aa1m3a9b64srz                     1925551782           18089075686129158449
Plan baseline has been loaded but still not used by query - new cursor has to be opened.
SQL> alter system flush shared_pool;

System altered.

SQL> select count(*) from soe.customers where customer_id = 19998;

                      COUNT(*)
------------------------------
                             1

SQL> select sql_id, plan_hash_value, exact_matching_signature, sql_plan_baseline from v$sql where sql_text = 'select count(*) from soe.customers where customer_id = 19998';

SQL_ID                       PLAN_HASH_VALUE       EXACT_MATCHING_SIGNATURE SQL_PLAN_BASELINE
------------- ------------------------------ ------------------------------ ------------------------------
aa1m3a9b64srz                     1925551782           18089075686129158449 SQL_PLAN_gq2afgwdf4k9je2333459
After reload test query is using SQL plan baseline.

Now is a time to present first method of conversing SQL_HANLDE and PLAN_NAME into SQL_ID and PLAN_HASH_VALUE. This method is simple but it has one pitfall - query has to be in shared pool.
SQL> select sql_id, plan_hash_value, exact_matching_signature, sql_plan_baseline from v$sql where sql_text = 'select count(*) from soe.customers where customer_id = 19998';

SQL_ID                       PLAN_HASH_VALUE       EXACT_MATCHING_SIGNATURE SQL_PLAN_BASELINE
------------- ------------------------------ ------------------------------ ------------------------------
aa1m3a9b64srz                     1925551782           18089075686129158449 SQL_PLAN_gq2afgwdf4k9je2333459

SQL> select sql_handle from dba_sql_plan_baselines where plan_name = 'SQL_PLAN_gq2afgwdf4k9je2333459';

SQL_HANDLE
------------------------------
SQL_fb094e7f1ae24931

SQL> select sql_handle, plan_name from dba_sql_plan_baselines where sql_handle = 'SQL_fb094e7f1ae24931';

SQL_HANDLE                     PLAN_NAME
------------------------------ ------------------------------
SQL_fb094e7f1ae24931           SQL_PLAN_gq2afgwdf4k9je2333459
First we need to display SQL_PLAN_BASELINE from V$SQL which is related to PLAN_NAME column in DBA_SQL_PLAN_BASELINES. Second query shows SQL_HANDLE for that PLAN_NAME and last one will show all entries (it can be more plans for one query) for this particular SQL_HANDLE. At the end all required information are on screen. I have used this method for long time but I hit into problem several times as my query wasn't in v$sql anymore.

Second method is more complicated but it is working even when SQL is not in shared pool any more. DBMS_XPLAN has possibility to display plan for particular SQL_HANDLE and it will be our source of PLAN_HASH_VALUE
SQL> select * from table(dbms_xplan.DISPLAY_SQL_PLAN_BASELINE('SQL_fb094e7f1ae24931'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SQL_fb094e7f1ae24931
SQL text: select count(*) from soe.customers where customer_id = 19998
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_gq2afgwdf4k9je2333459         Plan id: 3795006553
Enabled: YES     Fixed: NO      Accepted: YES     Origin: MANUAL-LOAD
--------------------------------------------------------------------------------

Plan hash value: 1925551782

-------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |              |     1 |     6 |     1   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE        |              |     1 |     6 |            |          |       |       |
|   2 |   PARTITION HASH SINGLE|              |     1 |     6 |     1   (0)| 00:00:01 |    16 |    16 |
|*  3 |    INDEX UNIQUE SCAN   | CUSTOMERS_PK |     1 |     6 |     1   (0)| 00:00:01 |    16 |    16 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CUSTOMER_ID"=19998)

26 rows selected.
Unfortunately there is no still information about SQL_ID. But I realize that DBA_SQL_PLAN_BASELINES is keeping full text of SQL query so I should be able to calculate HASH_VALUE. Anyway this is what DBMS_XPLAN is using to generate PLAN_HASH_VALUE - optimizer is generating plan when DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE has been invoked. I tried to calculate SQL_ID using md5sum or perl but then I recall post from Tanel Poder blog how to calculate SQL_ID based on HASH_VALUE and other post by Slavik Markovich how to calculate SQL_ID using PL/SQL. Great Work Guys !
Rest was quite simple - I joined all parts together (PL/SQL loop calculating SQL_ID has been taken from Slavik script)
declare
v_sqlid VARCHAR2(13);
v_num number;
BEGIN
dbms_output.put_line('SQL_ID       '||' '|| 'PLAN_HASH_VALUE' || ' ' || 'SQL_HANDLE                    ' || ' ' || 'PLAN_NAME');
dbms_output.put_line('-------------'||' '|| '---------------' || ' ' || '------------------------------' || ' ' || '--------------------------------');
for a in (select sql_handle, plan_name, trim(substr(g.PLAN_TABLE_OUTPUT,instr(g.PLAN_TABLE_OUTPUT,':')+1)) plan_hash_value, sql_text
                 from (select t.*, c.sql_handle, c.plan_name, c.sql_text from dba_sql_plan_baselines c, table(dbms_xplan.DISPLAY_SQL_PLAN_BASELINE(c.sql_handle, c.plan_name)) t
                 where c.sql_handle = '&sql_handle') g
                 where PLAN_TABLE_OUTPUT like 'Plan hash value%') loop
    v_num := to_number(sys.UTL_RAW.reverse(sys.UTL_RAW.SUBSTR(sys.dbms_crypto.hash(src => UTL_I18N.string_to_raw(a.sql_text || chr(0),'AL32UTF8'), typ => 2),9,4)) || sys.UTL_RAW.reverse(sys.UTL_RAW.SUBSTR(sys.dbms_crypto.hash(src => UTL_I18N.string_to_raw(a.sql_text || chr(0),'AL32UTF8'), typ => 2),13,4)),RPAD('x', 16, 'x'));
    v_sqlid := '';
    FOR i IN 0 .. FLOOR(LN(v_num) / LN(32))
    LOOP
        v_sqlid := SUBSTR('0123456789abcdfghjkmnpqrstuvwxyz',FLOOR(MOD(v_num / POWER(32, i), 32)) + 1,1) || v_sqlid;
    END LOOP;
    dbms_output.put_line(v_sqlid ||' ' || rpad(a.plan_hash_value,15) || ' ' || rpad(a.sql_handle,30) ||  ' ' || rpad(a.plan_name,30));
end loop;
end;
How it work:
SQL> declare
  2  v_sqlid VARCHAR2(13);
  3  v_num number;
  4  BEGIN
dbms_output.put_line('SQL_ID       '||' '|| 'PLAN_HASH_VALUE' || ' ' || 'SQL_HANDLE                    ' || ' ' || 'PLAN_NAME');
  6      LOOP
dbms_output.put_line('-------------'||' '|| '---------------' || ' ' || '------------------------------' || ' ' || '--------------------------------');
for a in (select sql_handle, plan_name, trim(substr(g.PLAN_TABLE_OUTPUT,instr(g.PLAN_TABLE_OUTPUT,':')+1)) plan_hash_value, sql_text
                 from (select t.*, c.sql_handle, c.plan_name, c.sql_text from dba_sql_plan_baselines c, table(dbms_xplan.DISPLAY_SQL_PLAN_BASELINE(c.sql_handle, c.plan_name)) t
                 where c.sql_handle = '&sql_handle') g
                 where PLAN_TABLE_OUTPUT like 'Plan hash value%') loop
    v_num := to_number(sys.UTL_RAW.reverse(sys.UTL_RAW.SUBSTR(sys.dbms_crypto.hash(src => UTL_I18N.string_to_raw(a.sql_text || chr(0),'AL32UTF8'), typ => 2),9,4)) || sys.UTL_RAW.reverse(sys.UTL_RAW.SUBSTR(sys.dbms_crypto.hash(src => UTL_I18N.string_to_raw(a.sql_text || chr(0),'AL32UTF8'), typ => 2),13,4)),RPAD('x', 16, 'x'));
    v_sqlid := '';
    FOR i IN 0 .. FLOOR(LN(v_num) / LN(32))
    LOOP
        v_sqlid := SUBSTR('0123456789abcdfghjkmnpqrstuvwxyz',FLOOR(MOD(v_num / POWER(32, i), 32)) + 1,1) || v_sqlid;
    END LOOP;
    dbms_output.put_line(v_sqlid ||' ' || rpad(a.plan_hash_value,15) || ' ' || rpad(a.sql_handle,30) ||  ' ' || rpad(a.plan_name,30));
 18  end loop;
 19  end;
 20  /
Enter value for sql_handle: SQL_fb094e7f1ae24931
old   9:                  where c.sql_handle = '&sql_handle') g
new   9:                  where c.sql_handle = 'SQL_fb094e7f1ae24931') g
SQL_ID        PLAN_HASH_VALUE SQL_HANDLE                     PLAN_NAME
------------- --------------- ------------------------------ --------------------------------
aa1m3a9b64srz 1925551782      SQL_fb094e7f1ae24931           SQL_PLAN_gq2afgwdf4k9je2333459

PL/SQL procedure successfully completed.
Let me clean shared pool
SQL> alter system flush shared_pool;

System altered.

SQL> select sql_id, plan_hash_value, exact_matching_signature, sql_plan_baseline from v$sql where sql_text = 'select count(*) from soe.customers where customer_id = 19998';

no rows selected
Let's try one more time
SQL> declare
  2  v_sqlid VARCHAR2(13);
  3  v_num number;
  4  BEGIN
dbms_output.put_line('SQL_ID       '||' '|| 'PLAN_HASH_VALUE' || ' ' || 'SQL_HANDLE                    ' || ' ' || 'PLAN_NAME');
  6      LOOP
dbms_output.put_line('-------------'||' '|| '---------------' || ' ' || '------------------------------' || ' ' || '--------------------------------');
for a in (select sql_handle, plan_name, trim(substr(g.PLAN_TABLE_OUTPUT,instr(g.PLAN_TABLE_OUTPUT,':')+1)) plan_hash_value, sql_text
                 from (select t.*, c.sql_handle, c.plan_name, c.sql_text from dba_sql_plan_baselines c, table(dbms_xplan.DISPLAY_SQL_PLAN_BASELINE(c.sql_handle, c.plan_name)) t
                 where c.sql_handle = '&sql_handle') g
                 where PLAN_TABLE_OUTPUT like 'Plan hash value%') loop
    v_num := to_number(sys.UTL_RAW.reverse(sys.UTL_RAW.SUBSTR(sys.dbms_crypto.hash(src => UTL_I18N.string_to_raw(a.sql_text || chr(0),'AL32UTF8'), typ => 2),9,4)) || sys.UTL_RAW.reverse(sys.UTL_RAW.SUBSTR(sys.dbms_crypto.hash(src => UTL_I18N.string_to_raw(a.sql_text || chr(0),'AL32UTF8'), typ => 2),13,4)),RPAD('x', 16, 'x'));
    v_sqlid := '';
    FOR i IN 0 .. FLOOR(LN(v_num) / LN(32))
    LOOP
        v_sqlid := SUBSTR('0123456789abcdfghjkmnpqrstuvwxyz',FLOOR(MOD(v_num / POWER(32, i), 32)) + 1,1) || v_sqlid;
    END LOOP;
    dbms_output.put_line(v_sqlid ||' ' || rpad(a.plan_hash_value,15) || ' ' || rpad(a.sql_handle,30) ||  ' ' || rpad(a.plan_name,30));
 18  end loop;
 19  end;
 20  /
Enter value for sql_handle: SQL_fb094e7f1ae24931
old   9:                  where c.sql_handle = '&sql_handle') g
new   9:                  where c.sql_handle = 'SQL_fb094e7f1ae24931') g
SQL_ID        PLAN_HASH_VALUE SQL_HANDLE                     PLAN_NAME
------------- --------------- ------------------------------ --------------------------------
aa1m3a9b64srz 1925551782      SQL_fb094e7f1ae24931           SQL_PLAN_gq2afgwdf4k9je2333459

PL/SQL procedure successfully completed.

SQL> select sql_id, plan_hash_value, exact_matching_signature, sql_plan_baseline from v$sql where sql_text = 'select count(*) from soe.customers where customer_id = 19998';

no rows selected

SQL>
Hope it help work with SQL Plan Management feature.

regards,
Marcin

Sunday, July 24, 2011

RMAN Database duplication without target connectivity can corrupt target database

Latest version of Oracle 11g R2 has new Recovery Manager functionality - database duplication from backup without connectivity to target database. During tests I have found unexpected behavior of that functionality. I'm still not sure if it is bug or RMAN is doing it by design but if backup doesn't contain all files and SKIP TABLESPACE is not included, duplication process can overwrite target database files. When database is using ASM it can happen regardless of database state. Due to file locking mechanism this issue occur on file system only when target database is down.

All has been performed using Oracle version 11.2.0.2 on Red Hat 5.3 running target and auxiliary database on one server using file system. I have tested it with ASM as well but log files are gone together with my VM machine.

Target database is down, Backup has been taken without one tablespace, 
no SKIP TABLESPACE clause.

Duplication script:
run {
set newname for tempefile 1 to '/opt/app/oracle/oradata/clon/temp01.dbf';
duplicate target database to clon
SPFILE
    PARAMETER_VALUE_CONVERT '/opt/app/oracle/admin/testdb/',
                            '/opt/app/oracle/admin/clon/'
    SET SGA_MAX_SIZE '500M'
    SET CONTROL_FILES '/opt/app/oracle/oradata/clon/control01.ctl','/opt/app/oracle/oradata/clon/control02.ctl'
    SET SGA_TARGET '450M'
    SET CLUSTER_DATABASE 'false'
    SET db_create_file_dest          '/opt/app/oracle/oradata/clon/'
    SET db_create_online_log_dest_1  '/opt/app/oracle/oradata/clon/'
    SET LOG_FILE_NAME_CONVERT  '/opt/app/oracle/oradata/testdb/','/opt/app/oracle/oradata/clon/'
    SET DB_FILE_NAME_CONVERT   '/opt/app/oracle/oradata/testdb/','/opt/app/oracle/oradata/clon/'
    BACKUP LOCATION '/opt/app/oracle/backup/';
}
Let's take a look on output - unimportant lines have been skipped
Recovery Manager: Release 11.2.0.2.0 - Production on Tue Jun 21 07:17:34 2011
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
connected to auxiliary database: CLON (not mounted)

Starting Duplicate Db at 21-JUN-11
...
Starting restore at 21-JUN-11
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=171 device type=DISK

channel ORA_AUX_DISK_1: restoring control file
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:03
output file name=/opt/app/oracle/oradata/clon/control01.ctl
output file name=/opt/app/oracle/oradata/clon/control02.ctl
Finished restore at 21-JUN-11

database mounted
released channel: ORA_AUX_DISK_1
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=171 device type=DISK

contents of Memory Script:
{
   set until scn  95159515;
   set newname for datafile  1 to 
 "/opt/app/oracle/oradata/clon/system-01.dbf";
   set newname for datafile  2 to 
 "/opt/app/oracle/oradata/clon/undo_t1-01.dbf";
   set newname for datafile  3 to 
 "/opt/app/oracle/oradata/clon/sysaux-01.dbf";
   set newname for datafile  4 to 
 "/opt/app/oracle/oradata/clon/administrator-01.dbf";
   set newname for datafile  5 to 
 "/opt/app/oracle/oradata/clon/administrator_idx-01.dbf";
   set newname for datafile  6 to 
 "/opt/app/oracle/oradata/clon/replication-01.dbf";
   set newname for datafile  7 to 
 "/opt/app/oracle/oradata/clon/replication_idx-01.dbf";
   set newname for datafile  8 to 
 "/opt/app/oracle/oradata/clon/soe.dbf";
   set newname for datafile  9 to 
 "/opt/app/oracle/oradata/clon/soeindex.dbf";
   set newname for datafile  10 to 
 "/opt/app/oracle/oradata/clon/users-01.dbf";
   restore
   clone database
   ;
}
executing Memory Script
executing command: SET until clause
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
So far so good - new name has been set for all files. What is next
file 9 is excluded from whole database backup
channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00001 to /opt/app/oracle/oradata/clon/system-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00002 to /opt/app/oracle/oradata/clon/undo_t1-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00003 to /opt/app/oracle/oradata/clon/sysaux-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00004 to /opt/app/oracle/oradata/clon/administrator-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00005 to /opt/app/oracle/oradata/clon/administrator_idx-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00006 to /opt/app/oracle/oradata/clon/replication-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00007 to /opt/app/oracle/oradata/clon/replication_idx-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00008 to /opt/app/oracle/oradata/clon/soe.dbf
channel ORA_AUX_DISK_1: restoring datafile 00010 to /opt/app/oracle/oradata/clon/users-01.dbf
channel ORA_AUX_DISK_1: reading from backup piece /opt/app/oracle/backup/bck_13_testdb_1
channel ORA_AUX_DISK_1: piece handle=/opt/app/oracle/backup/bck_13_testdb_1 tag=TAG20110621T064346
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:06:36
failover to previous backup

Finished restore at 21-JUN-11
All files from backupset have been restored and Oracle realize that one file is missing. Keep in mind that new name has been prepared for all files.
contents of Memory Script:
{
   switch clone datafile all;
}
executing Memory Script

datafile 1 switched to datafile copy
input datafile copy RECID=10 STAMP=754384699 file name=/opt/app/oracle/oradata/clon/system-01.dbf
datafile 2 switched to datafile copy
input datafile copy RECID=11 STAMP=754384699 file name=/opt/app/oracle/oradata/clon/undo_t1-01.dbf
datafile 3 switched to datafile copy
input datafile copy RECID=12 STAMP=754384699 file name=/opt/app/oracle/oradata/clon/sysaux-01.dbf
datafile 4 switched to datafile copy
input datafile copy RECID=13 STAMP=754384699 file name=/opt/app/oracle/oradata/clon/administrator-01.dbf
datafile 5 switched to datafile copy
input datafile copy RECID=14 STAMP=754384699 file name=/opt/app/oracle/oradata/clon/administrator_idx-01.dbf
datafile 6 switched to datafile copy
input datafile copy RECID=15 STAMP=754384699 file name=/opt/app/oracle/oradata/clon/replication-01.dbf
datafile 7 switched to datafile copy
input datafile copy RECID=16 STAMP=754384699 file name=/opt/app/oracle/oradata/clon/replication_idx-01.dbf
datafile 8 switched to datafile copy
input datafile copy RECID=17 STAMP=754384699 file name=/opt/app/oracle/oradata/clon/soe.dbf
datafile 10 switched to datafile copy
input datafile copy RECID=18 STAMP=754384700 file name=/opt/app/oracle/oradata/clon/users-01.dbf
This switch command is starting this issue - missing file hasn't been switch to new location
contents of Memory Script:
{
   set until scn  95159515;
   recover
   clone database
   delete archivelog
   ;
}
executing Memory Script
executing command: SET until clause
Starting recover at 21-JUN-11
using channel ORA_AUX_DISK_1
starting media recovery
channel ORA_AUX_DISK_1: starting archived log restore to default destination
channel ORA_AUX_DISK_1: restoring archived log
archived log thread=1 sequence=1024
channel ORA_AUX_DISK_1: restoring archived log
archived log thread=1 sequence=1025
channel ORA_AUX_DISK_1: reading from backup piece /opt/app/oracle/backup/bck_arch_16_testdb_1
channel ORA_AUX_DISK_1: piece handle=/opt/app/oracle/backup/bck_arch_16_testdb_1 tag=TAG20110621T070623
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:03
archived log file name=/opt/app/oracle/fra/CLON/archivelog/2011_06_21/o1_mf_1_1024_700kfxcx_.arc thread=1 sequence=1024
channel clone_default: deleting archived log(s)
archived log file name=/opt/app/oracle/fra/CLON/archivelog/2011_06_21/o1_mf_1_1024_700kfxcx_.arc RECID=2 STAMP=754384704
archived log file name=/opt/app/oracle/fra/CLON/archivelog/2011_06_21/o1_mf_1_1025_700kfxg6_.arc thread=1 sequence=1025
Oracle Error: 
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01152: file 9 was not restored from a sufficiently old backup 
ORA-01110: data file 9: '/opt/app/oracle/oradata/testdb/soeindex.dbf'
The worst part is here - RMAN was trying to apply logs to other database using entries from control file - the TARGET database entries. DBID at that time for target and clone database are still the same and target database was down so there are no locks. Nothing could prevent this process to overwrite target database file header.

What happen to target database ? The following error will be found in target alert.log file

Read of datafile '/opt/app/oracle/oradata/testdb/soeindex.dbf' (fno 9) header failed with ORA-01206
Rereading datafile 9 header failed with ORA-01206
Errors in file /opt/app/oracle/diag/rdbms/testdb/testdb/trace/testdb_dbw0_4984.trc:
ORA-01186: file 9 failed verification tests
ORA-01122: database file 9 failed verification check
File in target database has been corrupted during duplication process and it has to be restored from backup or dropped.

Target database is running, backup has been taken without one table space, 
no SKIP TABLESPACE clause.
Duplication process is starting in similar way to previous scenario but instead of overwriting target file the following error is reporting during recovery process.
RMAN-11001: Oracle Error: 
ORA-00283: recovery session canceled due to errors
ORA-01110: data file 9: '/opt/app/oracle/oradata/testdb/soeindex.dbf'
ORA-01157: cannot identify/lock data file 9 - see DBWR trace file
ORA-01110: data file 9: '/opt/app/oracle/oradata/testdb/soeindex.dbf'
This time duplication process failed but target database wasn't touched.

Target database is down, Backup has been taken without one tablespace, 
SKIP TABLESPACE clause has been setup
This time backup still doesn't have all files but SKIP_TABLESPACE has been added to script. Recovery manager is working now without any issues
Recovery Manager: Release 11.2.0.2.0 - Production on Tue Jun 21 09:32:21 2011
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
connected to auxiliary database: CLON (not mounted)
RMAN> run {
2> set newname for tempfile 1 to '/opt/app/oracle/oradata/clon/temp01.dbf';
3>                                                             
4> duplicate target database to clon                           
5> SPFILE                                                      
6>     PARAMETER_VALUE_CONVERT '/opt/app/oracle/admin/testdb/',
7>                             '/opt/app/oracle/admin/clon/'   
8>     SET SGA_MAX_SIZE '500M'                                 
9>     SET CONTROL_FILES '/opt/app/oracle/oradata/clon/control01.ctl','/opt/app/oracle/oradata/clon/control02.ctl'
10>     SET SGA_TARGET '450M'        
11>     SET CLUSTER_DATABASE 'false' 
12>     SET db_create_file_dest          '/opt/app/oracle/oradata/clon/'
13>     SET db_create_online_log_dest_1  '/opt/app/oracle/oradata/clon/'
14>     SET LOG_FILE_NAME_CONVERT  '/opt/app/oracle/oradata/testdb/','/opt/app/oracle/oradata/clon/'
15>     SET DB_FILE_NAME_CONVERT   '/opt/app/oracle/oradata/testdb/','/opt/app/oracle/oradata/clon/'
16>     SKIP TABLESPACE 'SOEINDEX'                       
17>     UNDO TABLESPACE 'UNDO_T1'                                                                   
18>     BACKUP LOCATION '/opt/app/oracle/backup/';                                                  
19> }     
20> 
executing command: SET NEWNAME

Starting Duplicate Db at 21-JUN-11
...
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=171 device type=DISK
Not connected to TARGET or TARGET not open, cannot verify that subset of tablespaces is self-contained
Not connected to TARGET, cannot verify that set of tablespaces being duplicated does not have SYS objects
Datafile 9 skipped by request

contents of Memory Script:
{
   set until scn  95159515;
   set newname for datafile  1 to 
 "/opt/app/oracle/oradata/clon/system-01.dbf";
   set newname for datafile  2 to 
 "/opt/app/oracle/oradata/clon/undo_t1-01.dbf";
   set newname for datafile  3 to 
 "/opt/app/oracle/oradata/clon/sysaux-01.dbf";
   set newname for datafile  4 to 
 "/opt/app/oracle/oradata/clon/administrator-01.dbf";
   set newname for datafile  5 to 
 "/opt/app/oracle/oradata/clon/administrator_idx-01.dbf";
   set newname for datafile  6 to 
 "/opt/app/oracle/oradata/clon/replication-01.dbf";
   set newname for datafile  7 to 
 "/opt/app/oracle/oradata/clon/replication_idx-01.dbf";
   set newname for datafile  8 to 
 "/opt/app/oracle/oradata/clon/soe.dbf";
   set newname for datafile  10 to 
 "/opt/app/oracle/oradata/clon/users-01.dbf";
   restore
   clone database
   skip forever tablespace  "SOEINDEX"   ;
}
executing Memory Script
executing command: SET until clause
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME

Starting restore at 21-JUN-11
using channel ORA_AUX_DISK_1

channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00001 to /opt/app/oracle/oradata/clon/system-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00002 to /opt/app/oracle/oradata/clon/undo_t1-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00003 to /opt/app/oracle/oradata/clon/sysaux-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00004 to /opt/app/oracle/oradata/clon/administrator-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00005 to /opt/app/oracle/oradata/clon/administrator_idx-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00006 to /opt/app/oracle/oradata/clon/replication-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00007 to /opt/app/oracle/oradata/clon/replication_idx-01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00008 to /opt/app/oracle/oradata/clon/soe.dbf
channel ORA_AUX_DISK_1: restoring datafile 00010 to /opt/app/oracle/oradata/clon/users-01.dbf
channel ORA_AUX_DISK_1: reading from backup piece /opt/app/oracle/backup/bck_13_testdb_1
channel ORA_AUX_DISK_1: piece handle=/opt/app/oracle/backup/bck_13_testdb_1 tag=TAG20110621T064346
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:06:25
Finished restore at 21-JUN-11

contents of Memory Script:
{
   switch clone datafile all;
}
executing Memory Script

datafile 1 switched to datafile copy
input datafile copy RECID=10 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/system-01.dbf
datafile 2 switched to datafile copy
input datafile copy RECID=11 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/undo_t1-01.dbf
datafile 3 switched to datafile copy
input datafile copy RECID=12 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/sysaux-01.dbf
datafile 4 switched to datafile copy
input datafile copy RECID=13 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/administrator-01.dbf
datafile 5 switched to datafile copy
input datafile copy RECID=14 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/administrator_idx-01.dbf
datafile 6 switched to datafile copy
input datafile copy RECID=15 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/replication-01.dbf
datafile 7 switched to datafile copy
input datafile copy RECID=16 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/replication_idx-01.dbf
datafile 8 switched to datafile copy
input datafile copy RECID=17 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/soe.dbf
datafile 10 switched to datafile copy
input datafile copy RECID=18 STAMP=754393152 file name=/opt/app/oracle/oradata/clon/users-01.dbf

contents of Memory Script:
{
   set until scn  95159515;
   recover
   clone database
   skip forever tablespace  "SOEINDEX"    delete archivelog
   ;
}
executing Memory Script

executing command: SET until clause

Starting recover at 21-JUN-11
using channel ORA_AUX_DISK_1

Executing: alter database datafile 9 offline drop
starting media recovery

channel ORA_AUX_DISK_1: starting archived log restore to default destination
channel ORA_AUX_DISK_1: restoring archived log
archived log thread=1 sequence=1024
channel ORA_AUX_DISK_1: restoring archived log
archived log thread=1 sequence=1025
channel ORA_AUX_DISK_1: reading from backup piece /opt/app/oracle/backup/bck_arch_16_testdb_1
channel ORA_AUX_DISK_1: piece handle=/opt/app/oracle/backup/bck_arch_16_testdb_1 tag=TAG20110621T070623
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:07
archived log file name=/opt/app/oracle/fra/CLON/archivelog/2011_06_21/o1_mf_1_1024_700sp3lj_.arc thread=1 sequence=1024
channel clone_default: deleting archived log(s)
archived log file name=/opt/app/oracle/fra/CLON/archivelog/2011_06_21/o1_mf_1_1024_700sp3lj_.arc RECID=2 STAMP=754393158
archived log file name=/opt/app/oracle/fra/CLON/archivelog/2011_06_21/o1_mf_1_1025_700sp3p1_.arc thread=1 sequence=1025
channel clone_default: deleting archived log(s)
archived log file name=/opt/app/oracle/fra/CLON/archivelog/2011_06_21/o1_mf_1_1025_700sp3p1_.arc RECID=1 STAMP=754393155
media recovery complete, elapsed time: 00:00:04
Finished recover at 21-JUN-11
Oracle instance started
Additional step has been run before recover process - file from skipped table space has been dropped. In that case recovery process should touch it anymore from any location.

This exercise is showing that all new features should be tested in safe environment and in addition to that every operation on production database has to be well tested. I can't blame only Recovery Manager here but it would nice to have more checks in place (ex. if not all files are in backup just cancel duplication after control file has been restored). This is a perfect example of human error (at least someone created that scripts and run it) which can in complex environment. 

regards,
Marcin

Saturday, June 18, 2011

How to color (mark) SQL for AWR snapshots

Are all SQL statements you’re looking for in your AWR repository ? If not, there is a simple way to ‘ask’ Oracle to include list of SQL_ID in every snapshot. All you need to do is color it. No, not using marker on the screen with Grid Control Performance tab but using DBMS_WORKLOAD_REPOSITORY. ADD_COLORED_SQL  procedure. Since you mark sql_id as colored it will be included in every AWR snapshot even if it is not one of the top SQL's.
All colored sql_id together with creation date could be checked in the following view - DBA_HIST_COLORED_SQL. After all work there is a simple way to uncolor sql_id (make it white ? ). There is a procedure called DBMS_WORKLOAD_REPOSITORY.REMOVE_COLORED_SQL.

Simple example:
- script used to display SQL statistics from AWR repository
$ cat sql_stats.sql
col exetime format 99999999999.99
col begin_interval_time format a30
select begin_interval_time, PLAN_HASH_VALUE, ELAPSED_TIME_DELTA/EXECUTIONS_DELTA exetime, EXECUTIONS_DELTA, CPU_TIME_DELTA, ELAPSED_TIME_DELTA, DISK_READS_DELTA from dba_hist_sqlstat ss, dba_hist_snapshot s where s.snap_id = ss.snap_id and sql_id = '&SQL_ID' and s.snap_id > (select max(snap_id)-96 from dba_hist_snapshot) and EXECUTIONS_DELTA<>0 order by s.snap_id, sql_id;
Checking AWR repository for particular SQL_ID.
SQL> @sql_stats.sql
Enter value for sql_id: 6zfggtprazcvb
old   2: where s.snap_id = ss.snap_id and sql_id = '&SQL_ID' and s.snap_id > (select max(snap_id)-96 from dba_hist_snapshot) and EXECUTIONS_DELTA<>0 order by s.snap_id, sql_id
new   2: where s.snap_id = ss.snap_id and sql_id = '6zfggtprazcvb' and s.snap_id > (select max(snap_id)-96 from dba_hist_snapshot) and EXECUTIONS_DELTA<>0 order by s.snap_id, sql_id
BEGIN_INTERVAL_TIME            PLAN_HASH_VALUE         EXETIME EXECUTIONS_DELTA CPU_TIME_DELTA ELAPSED_TIME_DELTA DISK_READS_DELTA 
------------------------------ --------------- --------------- ---------------- -------------- ------------------ ---------------- 
16-JUN-11 04.00.43.639 PM            927619989     90627544.40               20       12879044         1812550888           125728 
16-JUN-11 04.15.51.312 PM            927619989    282777061.76              151      279247551         4.2699E+10            72065 
16-JUN-11 04.30.57.081 PM           1587981875      1308903.00                7         203969            9162321             1894 
16-JUN-11 09.46.00.195 PM           1587981875       889464.06               16         760882           14231425             2463 
17-JUN-11 12.45.49.596 AM           1587981875      1390759.44               16         778881           22252151             2973 
17-JUN-11 04.45.38.168 AM           1587981875       621587.94               16         682897            9945407             2320 
17-JUN-11 06.45.18.396 AM           1587981875       415950.50               16         692895            6655208             1774 
17-JUN-11 07.45.35.975 AM           1587981875       865986.07               15         642901           12989791             1228 

8 rows selected.
SQL has been executed in last hour a few times but was not included in AWR snapshot
SQL> select sample_time, session_id, sql_id, sql_plan_hash_value, sql_child_number  from v$active_session_history where sql_id = '6zfggtprazcvb' and sample_time > sysdate - 1/24 order by sample_time;

SAMPLE_TIME                   SESSION_ID SQL_ID        SQL_PLAN_HASH_VALUE SQL_CHILD_NUMBER
----------------------------- ---------- ------------- ------------------- ----------------
17-JUN-11 03.21.43.399 PM           1711 6zfggtprazcvb          1587981875               10
17-JUN-11 03.21.44.399 PM           1711 6zfggtprazcvb          1587981875               10
17-JUN-11 03.21.45.399 PM           1711 6zfggtprazcvb          1587981875               10
17-JUN-11 03.21.46.399 PM           1711 6zfggtprazcvb          1587981875               10
17-JUN-11 03.21.47.409 PM           1711 6zfggtprazcvb          1587981875               10
17-JUN-11 03.21.48.409 PM           1711 6zfggtprazcvb          1587981875               10
17-JUN-11 03.21.49.409 PM           1711 6zfggtprazcvb          1587981875               10
17-JUN-11 03.21.50.409 PM           1711 6zfggtprazcvb          1587981875               10
17-JUN-11 03.21.51.409 PM           1711 6zfggtprazcvb          1587981875               10
17-JUN-11 03.21.52.419 PM           1711 6zfggtprazcvb          1587981875               10
17-JUN-11 03.33.42.862 PM            474 6zfggtprazcvb          1587981875               10
17-JUN-11 03.33.43.862 PM            474 6zfggtprazcvb          1587981875               10
17-JUN-11 03.33.44.862 PM            474 6zfggtprazcvb          1587981875               10
17-JUN-11 03.49.37.819 PM           1724 6zfggtprazcvb          1587981875               10
17-JUN-11 03.49.38.819 PM           1724 6zfggtprazcvb          1587981875               10
SQL_ID has been colored (marked) using DBMS_WORKLOAD_REPOSITORY procedure
SQL> exec dbms_workload_repository.add_colored_sql('6zfggtprazcvb');

PL/SQL procedure successfully completed.

SQL> select * from DBA_HIST_COLORED_SQL;

      DBID SQL_ID        CREATE_TIME
---------- ------------- -------------------
1878812188 6zfggtprazcvb 2011-06-17 15:55:45
Now this SQL_ID has been included in AWR snapshot even if it has been executed once - see last row
SQL> @sql_stats.sql
Enter value for sql_id: 6zfggtprazcvb
old   2: where s.snap_id = ss.snap_id and sql_id = '&SQL_ID' and s.snap_id > (select max(snap_id)-96 from dba_hist_snapshot) and EXECUTIONS_DELTA<>0 order by s.snap_id, sql_id
new   2: where s.snap_id = ss.snap_id and sql_id = '6zfggtprazcvb' and s.snap_id > (select max(snap_id)-96 from dba_hist_snapshot) and EXECUTIONS_DELTA<>0 order by s.snap_id, sql_id

BEGIN_INTERVAL_TIME            PLAN_HASH_VALUE         EXETIME EXECUTIONS_DELTA CPU_TIME_DELTA ELAPSED_TIME_DELTA DISK_READS_DELTA 
------------------------------ --------------- --------------- ---------------- -------------- ------------------ ---------------- 
16-JUN-11 04.00.43.639 PM            927619989     90627544.40               20       12879044         1812550888           125728 
16-JUN-11 04.15.51.312 PM            927619989    282777061.76              151      279247551         4.2699E+10            72065 
16-JUN-11 04.30.57.081 PM           1587981875      1308903.00                7         203969            9162321             1894 
16-JUN-11 09.46.00.195 PM           1587981875       889464.06               16         760882           14231425             2463 
17-JUN-11 12.45.49.596 AM           1587981875      1390759.44               16         778881           22252151             2973 
17-JUN-11 04.45.38.168 AM           1587981875       621587.94               16         682897            9945407             2320 
17-JUN-11 06.45.18.396 AM           1587981875       415950.50               16         692895            6655208             1774 
17-JUN-11 07.45.35.975 AM           1587981875       865986.07               15         642901           12989791             1228 
17-JUN-11 03.45.11.552 PM           1587981875      1818946.00                1          12998            1818946              226 
At the end of this example SQL_ID has been uncolored (unmarked).
SQL> exec dbms_workload_repository.remove_colored_sql('6zfggtprazcvb');

PL/SQL procedure successfully completed.

SQL> select * from DBA_HIST_COLORED_SQL;

no rows selected

SQL>