Tuesday, October 4, 2011

Grid Control 11g - 2 small improvments

I installed Oracle Grid Control 11g months ago and left it with typical configuration. Recently it becomes a little bit unstable and did a quick audit. Here are my first 2 finding but and will do a deeper dive into into if only I will find some more time.

Memory settings
In default configuration of Weblogic Management server running Grid Control instance maximum number of Java stack memory (parameter -Xmx) is unset and it is using default value of 512 MB. It is too small for big installation with many agents and concurrent users. Weblogic configuration allow users to set USER_MEM_ARGS variable with Java memory settings in starting script which will be used instead of MEM_ARGS variable. I has goint to change startManagedWebLogic.sh file as I want to keep default values for admin server but I found better solution on  Oracle Enterprise Manager blog in "Increasing the JAVA heap size of the OMS" section. Instead of startManagedWebLogic.sh you need to edit startEMServer.sh file
and add following section just before last line
if [ "${SERVER_NAME}" != "EMGC_ADMINSERVER" ] ; then
   USER_MEM_ARGS="-Xms256m -Xmx1024m -XX:MaxPermSize=512m -XX:CompileThreshold=8000 -XX:PermSize=128m"
   export USER_MEM_ARGS
fi
Log rotation
It was very typical for old Oracle Application Server or Grid Control 10g that logs weren't rotated in default configurations. I have seen this many times when I have been asked to do some troubleshooting. After installation of Grid Control 11g together with Weblogic I was thinking how many logs has been left without rotation this time. Most of them looks OK but I found one mod_wl_ohs.log belonging to WebTier (Oracle HTTP server) which is still not rotated. This log file is configured in this configuration file mod_wl_ohs.conf and it is quite easy to add rotation to it. As all rotated Apache logs this line
WLLogFile ${ORACLE_INSTANCE}/diagnostics/logs/${COMPONENT_TYPE}/${COMPONENT_NAME}/mod_wl_ohs.log
should be started with rotatelog command:
WLLogFile "|${ORACLE_HOME}/ohs/bin/rotatelogs ${ORACLE_INSTANCE}/diagnostics/logs/${COMPONENT_TYPE}/${COMPONENT_NAME}/mod_wl_ohs.log 43200"

Update:
Oracle Enterprise Manager 12c (like Cloud) has been just officially presented. There is still not documentation but you can download and install it.
regards,
Marcin

Monday, October 3, 2011

Oracle PeopleSoft Enterprise Financial Management - book review

This is my second review in short time frame but totally different from first one. It is not about Oracle DB and I’m doing it from newbie position. This is my first experience with Oracle PeopleSoft from business perspective. Oracle PeopleSoft Enterprise Financial Management 9.1 Implementation is covering following topics from business perspective:
1. Financials Fundamentals
2. PeopleSoft Security
3. PeopleSoft Billing Module
4. PeopleSoft Accounts Receivable Module
5. PeopleSoft Asset Management Module
6. PeopleSoft Accounts Payable Module
7. PeopleSoft General Ledger Module
8. PeopleSoft Expenses Module
9. PeopleSoft Commitment Control

Whole book is covering one example based on Automotive company and every chapter is describing other business requirements.
All topics are explained in details and every topic has a short introduction from functional perspective. There are also implementation challenges for most common issues. Every chapter is guiding user though module configuration and typical actions. Examples have slides and description of all actions.

I can recommend this book for all for people without financial background like me as a starting point into PeopleSoft application world.

regards,
Marcin

Saturday, October 1, 2011

Online index rebuild cleanup

 If session performing online index rebuild will be killed by mistake Oracle end up with some inconsistency inside dictionary. If you try to re-run killed command to create index once more database will complain that index already exist. Although if you try to drop index database will complain that index is in rebuild state and could not be dropped at that time.
To fix that issue you need to use package dbms_repair as showed in example:

  1. Building test table
    SQL> create table test tablespace users as select rownum id, 'xxxxxxxxxxxxxxxxxxxxxxxx' col1 from dba_source, dba_source where rownum < 10000000;
    Table created.
    
  2. New index build has been started with online clause and session has been killed
    SQL> create index test_index on test (id) tablespace users online;
    create index test_index on test (id) tablespace users online
                                                               *
    ERROR at line 1:
    ORA-03113: end-of-file on communication channel
    Process ID: 4356
    Session ID: 191 Serial number: 13
    
  3. Sanity check - index exist even if build has been never completed.
    SQL> select index_name from dba_indexes where table_name = 'TEST';
    INDEX_NAME
    ------------------------------
    TEST_INDEX
    
  4. Now let's try to drop it
    SQL> drop index test_index;
    drop index test_index
               *
    ERROR at line 1:
    ORA-08104: this index object 66960 is being online built or rebuilt
    
  5. So maybe we can create it again ?
    SQL> create index test_index on test (id) tablespace users online;
    create index test_index on test (id) tablespace users online
                 *
    ERROR at line 1:
    ORA-00955: name is already used by an existing object
    
  6. Let's clean it up and allow Oracle to drop index - ONLINE_INDEX_CLEAN need a object number which can be taken from drop index error message or using the following query (at least it worked for me).
    SQL> select min(object_id) from dba_objects where object_name = 'TEST_INDEX';
    
    MIN(OBJECT_ID)
    --------------
            66960
    
    SQL> declare
      2  ret boolean;
      3
      4  begin
      5      ret:=dbms_repair.ONLINE_INDEX_CLEAN(66960);
      6  end;
      7  /
    
    PL/SQL procedure successfully completed.
    
    and now let's check if index is still there
    SQL> select index_name from dba_indexes where table_name = 'TEST';
    no rows selected
    
Hope it help to solve some of your problems.
regards,
Marcin

Sunday, September 18, 2011

How to restore Oracle Grid Infrastructure OCR and vote disk on ASM

After reading the book about Oracle RAC (see previous post) I decided to test recovery of OCR (Oracle Cluster Registry) file and vote disk placed on ASM disk groups. This is an operation I never did before. I was lucky enough and my ASM based RAC installation never had similar issue but I did it before when OCR and vote disk were placed on raw or block devices.
According to Oracle doc it should be straight forward and it really is but you need to careful otherwise you can hit strange errors, like "segmentation fault" for ocrconfig when Grid Infrastructure is down.
Here is step by step instruction:
  1. Check if you can access OCR files
    [root@node1 ~]# /u01/app/11.2.0/grid/bin/ocrcheck
    PROT-602: Failed to retrieve data from the cluster registry
    PROC-26: Error while accessing the physical storage
    
  2. Stop CRS on all nodes
    [root@node1 ~]# export CRS_HOME=/u01/app/11.2.0/grid
    [root@node1 ~]# $CRS_HOME/bin/crsctl stop crs
    CRS-2796: The command may not proceed when Cluster Ready Services is not running
    CRS-4687: Shutdown command has completed with errors.
    CRS-4000: Command Stop failed, or completed with errors.
    
    If you got error like that try force option
    [root@node1 ~]# $CRS_HOME/bin/crsctl stop crs -f
    CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node1'
    CRS-2673: Attempting to stop 'ora.ctssd' on 'node1'
    CRS-2673: Attempting to stop 'ora.evmd' on 'node1'
    CRS-2673: Attempting to stop 'ora.asm' on 'node1'
    CRS-2673: Attempting to stop 'ora.mdnsd' on 'node1'
    CRS-2677: Stop of 'ora.asm' on 'node1' succeeded
    CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'node1'
    CRS-2677: Stop of 'ora.mdnsd' on 'node1' succeeded
    CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'node1' succeeded
    CRS-2677: Stop of 'ora.evmd' on 'node1' succeeded
    CRS-2677: Stop of 'ora.ctssd' on 'node1' succeeded
    CRS-2673: Attempting to stop 'ora.cssd' on 'node1'
    CRS-2677: Stop of 'ora.cssd' on 'node1' succeeded
    CRS-2673: Attempting to stop 'ora.diskmon' on 'node1'
    CRS-2673: Attempting to stop 'ora.crf' on 'node1'
    CRS-2677: Stop of 'ora.crf' on 'node1' succeeded
    CRS-2673: Attempting to stop 'ora.gipcd' on 'node1'
    CRS-2677: Stop of 'ora.diskmon' on 'node1' succeeded
    CRS-2677: Stop of 'ora.gipcd' on 'node1' succeeded
    CRS-2673: Attempting to stop 'ora.gpnpd' on 'node1'
    CRS-2677: Stop of 'ora.gpnpd' on 'node1' succeeded
    CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node1' has completed
    CRS-4133: Oracle High Availability Services has been stopped.
    [root@node1 ~]#
    
  3. When OCR file and vote disk are located on ASM disk group running ASM instance is required for recovery. Current version of Oracle Grid Infrastructure (GI) can be started in exclusive mode on one node to allow checks and restore.
    There is a slight change between version 11.2.0.1 and 11.2.0.2 - in first release we need to start GI in exclusive mode and then stop CRS service (as showed below). In 11.2.0.2 we can start GI without CRS but old steps are still working.
    Steps for 11.2.0.1
    [root@node1 ~]# $CRS_HOME/bin/crsctl start crs -excl
    CRS-4123: Oracle High Availability Services has been started.
    CRS-2672: Attempting to start 'ora.mdnsd' on 'node1'
    CRS-2676: Start of 'ora.mdnsd' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.gpnpd' on 'node1'
    CRS-2676: Start of 'ora.gpnpd' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node1'
    CRS-2672: Attempting to start 'ora.gipcd' on 'node1'
    CRS-2676: Start of 'ora.cssdmonitor' on 'node1' succeeded
    CRS-2676: Start of 'ora.gipcd' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.cssd' on 'node1'
    CRS-2672: Attempting to start 'ora.diskmon' on 'node1'
    CRS-2676: Start of 'ora.diskmon' on 'node1' succeeded
    CRS-2676: Start of 'ora.cssd' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.ctssd' on 'node1'
    CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'node1'
    CRS-2676: Start of 'ora.ctssd' on 'node1' succeeded
    CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error:
    Start action for HAIP aborted
    CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'node1' failed
    CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'node1'
    CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.asm' on 'node1'
    CRS-2674: Start of 'ora.asm' on 'node1' failed
    CRS-2673: Attempting to stop 'ora.ctssd' on 'node1'
    CRS-2677: Stop of 'ora.ctssd' on 'node1' succeeded
    CRS-4000: Command Start failed, or completed with errors.
    [root@node1 ~]#
    
    Errors are expected so now we need to stop CRS (if it has been started)
    [root@node1 ~]# $CRS_HOME/bin/crsctl stop resource ora.crsd -init
    [root@node1 ~]# $CRS_HOME/bin/crsctl stop resource ora.crsd -init
    CRS-2500: Cannot stop resource 'ora.crsd' as it is not running
    CRS-4000: Command Stop failed, or completed with errors.
    
    Steps for 11.2.0.2 - this allow us to avoid starting CRS which is trying to start resources. Issues related to CRS resource will bring down HAIP and ASM - and this is root cause of errors shown in 11.2.0.1 example.
    [root@node1 ~]# $CRS_HOME/bin/crsctl start crs -excl -nocrs
    CRS-4123: Oracle High Availability Services has been started.
    CRS-2672: Attempting to start 'ora.mdnsd' on 'node1'
    CRS-2676: Start of 'ora.mdnsd' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.gpnpd' on 'node1'
    CRS-2676: Start of 'ora.gpnpd' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node1'
    CRS-2672: Attempting to start 'ora.gipcd' on 'node1'
    CRS-2676: Start of 'ora.cssdmonitor' on 'node1' succeeded
    CRS-2676: Start of 'ora.gipcd' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.cssd' on 'node1'
    CRS-2672: Attempting to start 'ora.diskmon' on 'node1'
    CRS-2676: Start of 'ora.diskmon' on 'node1' succeeded
    CRS-2676: Start of 'ora.cssd' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'node1'
    CRS-2672: Attempting to start 'ora.ctssd' on 'node1'
    CRS-2676: Start of 'ora.ctssd' on 'node1' succeeded
    CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'node1' succeeded
    CRS-2672: Attempting to start 'ora.asm' on 'node1'
    CRS-2676: Start of 'ora.asm' on 'node1' succeeded
    
    No errors has been displayed and GI has been started in exclusive mode.
  4. Now it is time to check if ASM instance is running and recreate ASM disk groups.
    [oracle@node1 ~]$ ps aux | grep pmon
    oracle    8009  0.0  0.9 495804 18564 ?        Ss   20:37   0:00 asm_pmon_+ASM1
    oracle    8260  0.0  0.0  61184   760 pts/3    S+   20:38   0:00 grep pmon
    
    ASM is running. Now we can connect and create a disk group for OCR and vote disk. Be sure that your new group will have proper COMPATIBLE.ASM value. You can put OCR and vote on groups with COMPATIBLE.ASM greater than 11.2.
    [oracle@node1 ~]$ export ORACLE_HOME=/u01/app/11.2.0/grid
    [oracle@node1 ~]$ export PATH=$PATH:$ORACLE_HOME/bin
    [oracle@node1 ~]$ export ORACLE_SID=+ASM1
    [oracle@node1 ~]$ sqlplus / as sysasm
    SQL*Plus: Release 11.2.0.2.0 Production on Tue Aug 30 20:39:14 2011
    Copyright (c) 1982, 2010, Oracle.  All rights reserved.
    Connected to:
    Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
    With the Real Application Clusters and Automatic Storage Management options
    SQL> show parameter asm_di
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    asm_diskgroups                       string
    asm_diskstring                       string
    
    SQL> alter system set asm_diskstring = '/dev/asmdisk*';
    System altered.
    
    SQL> create diskgroup DATA external redundancy disk '/dev/asmdisk11' attribute 'COMPATIBLE.ASM' = '11.2';
    Diskgroup created.
    
  5. In next step we need to create spfile for ASM instance as it will be used for rest of cluster to find out location of ASM disks using 'asm_diskstring'.
    First text init.ora file has been created and then new binary spfile has been created on ASM disk group.
    [oracle@node1 rac-cluster]$ cat /tmp/init.ora
    *.asm_power_limit=1
    *.diagnostic_dest='/u01/app/oracle'
    *.instance_type='asm'
    *.large_pool_size=12M
    *.remote_login_passwordfile='EXCLUSIVE'
    *.asm_diskstring = '/dev/asmdisk*'
    [oracle@node1 rac-cluster]$ sqlplus / as sysasm
    SQL*Plus: Release 11.2.0.2.0 Production on Tue Aug 30 21:23:36 2011
    Copyright (c) 1982, 2010, Oracle.  All rights reserved.
    Connected to:
    Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
    With the Real Application Clusters and Automatic Storage Management options
    
    SQL> create spfile='+DATA' from pfile='/tmp/init.ora';
    File created.
    
  6. When disk group is ready it is time now to restore OCR from backup file.
    Oracle Grid Infrastructure is doing OCR backup automatically into following location $GRID_HOME/cdata/. Let's restore and check if OCR file is recognized properly after it.
    [root@node1 ~]# $CRS_HOME/bin/ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-cluster/backup_20110830_201118.ocr
    
    [root@node1 ~]# $CRS_HOME/bin/ocrcheck
    Status of Oracle Cluster Registry is as follows :
             Version                  :          3
             Total space (kbytes)     :     262120
             Used space (kbytes)      :       3132
             Available space (kbytes) :     258988
             ID                       :  660203047
             Device/File Name         :      +DATA
                                        Device/File integrity check succeeded
                                        Device/File not configured
                                        Device/File not configured
                                        Device/File not configured
                                        Device/File not configured
             Cluster registry integrity check succeeded
             Logical corruption check succeeded
    
  7. Now it is time to restore vote disk. This process will read asm_diskstring from ASM instance and will place vote files on these disks. See what happen when asm_diskstring is empty.
    [root@node1 ~]# $CRS_HOME/bin/crsctl replace votedisk +DATA
    CRS-4602: Failed 27 to add voting file 28652f742fc44f28bfc6d12d1412a604.
    Failed to replace voting disk group with +DATA.
    CRS-4000: Command Replace failed, or completed with errors.
    
    Error message from log file.
    [cssd(7894)]CRS-1638:Unable to locate voting file with ID 1b37b25b-686c4fb4-bfb82eac-357f48df that is being added to the list of configured voting files; details at (:CSSNM00022:) in /u01/app/11.2.0/grid/log/node1/cssd/ocssd.log
    2011-08-30 20:44:12.256
    
    When asm_diskstring is set up properly it is looking much better.
    [root@node1 ~]# $CRS_HOME/bin/crsctl replace votedisk +DATA
    Successful addition of voting disk 4ca8c2b58d394ff1bf7a9b88dd9f5fc3.
    Successfully replaced voting disk group with +DATA.
    CRS-4266: Voting file(s) successfully replaced
    [root@node1 ~]#  $CRS_HOME/bin/crsctl query css votedisk
    ##  STATE    File Universal Id                File Name Disk group
    --  -----    -----------------                --------- ---------
     1. ONLINE   4ca8c2b58d394ff1bf7a9b88dd9f5fc3 (/dev/asmdisk11) [DATA]
    Located 1 voting disk(s).
    
  8. In last step Grid Infrastructure need to be restarted on all nodes.
    [root@node1 ~]#  $CRS_HOME/bin/crsctl stop crs -f
    CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node1'
    CRS-2673: Attempting to stop 'ora.mdnsd' on 'node1'
    CRS-2673: Attempting to stop 'ora.ctssd' on 'node1'
    CRS-2673: Attempting to stop 'ora.asm' on 'node1'
    CRS-2677: Stop of 'ora.asm' on 'node1' succeeded
    CRS-2677: Stop of 'ora.mdnsd' on 'node1' succeeded
    CRS-2677: Stop of 'ora.ctssd' on 'node1' succeeded
    CRS-2673: Attempting to stop 'ora.cssd' on 'node1'
    CRS-2677: Stop of 'ora.cssd' on 'node1' succeeded
    CRS-2673: Attempting to stop 'ora.gipcd' on 'node1'
    CRS-2673: Attempting to stop 'ora.diskmon' on 'node1'
    CRS-2677: Stop of 'ora.diskmon' on 'node1' succeeded
    CRS-2677: Stop of 'ora.gipcd' on 'node1' succeeded
    CRS-2673: Attempting to stop 'ora.gpnpd' on 'node1'
    CRS-2677: Stop of 'ora.gpnpd' on 'node1' succeeded
    CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node1' has completed
    CRS-4133: Oracle High Availability Services has been stopped.
    
    [root@node1 ~]#  $CRS_HOME/bin/crsctl start crs
    CRS-4123: Oracle High Availability Services has been started.
    
    If everything went well similar result is expected
    [root@node1 ~]# $CRS_HOME/bin/crsctl check cluster -all
    **************************************************************
    node1:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    node2:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    node3:
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    **************************************************************
    
Now Grid Infrastructure is working and if database ASM disk groups has been destroyed as well or we have everything in one ASM disk groups (like in my example) it is time to restore database from backup using Oracle Recovery Manager. But this is a different story.

Hope it help someone to restore Grid Infrastructure after disk crash.
regards,
Marcin

Monday, August 29, 2011

Book review - Oracle 11g R1/R2 Real Application Clusters Essentials

Recently I have read a new book “Oracle 11g R1/R2 Real Application Clusters Essentials” by Ben Prusinski and Syed Jaffer Hussain.. First of all I would like to thanks both authors for good lecture for few days. Whole book is divided into a few parts:
  • introduction to Oracle Real Application Server including High Availability concepts, various storage options and overall Oracle RAC architecture - chapter 1 and 2
  • installation of Oracle RAC components - Clusterware, Automatic Storage Management - chapter 3
  • overview of Oracle ASM - chapter 4
  • cluster components troubleshooting - chapter 5
  • Oracle Database installation  - chapter 6
  • Day by day topics including administration, backup, performance tuning and upgrade of Oracle Real Application Cluster - chapter 6, 7, 8 and 9
  • Real life examples including implementation of Oracle EBS using Oracle RAC - chapter 10 and 11
  • Oracle Maximum Availability - Chapter 12
  • Overview of Oracle tools for more experience users - Appendinx
In a first part authors are presenting High Availability options for Oracle and describing all main hardware components required for Oracle RAC implementation (server, network and storage options). Next chapter is covering an Oracle RAC architecture and it is covering mostly 11g Release 1. There is some information about a 11g Release 2 as well but due to number of changes between both version I would like to see two chapters instead of one - some additional information about a 11gR2 is covered at the end of appendix.

The next part is a guide how to install whole an Oracle RAC infrastructure - Clusterware and Automatic Storage Management in case of 11g Release 1 or Grid Infrastructure in case of version 11g Release 2. It is well documented and I will help new users to go though this process.

The next chapter is covering all topics related to Oracle Automatic Storage Management. Authors are presenting concepts and overview of this component as well as basic administration options. There is also separate part about new features of Oracle ASM 11g Release 2 covering ACFS file system.   

Chapter 5 is about managing and  troubleshooting Oracle RAC components. Some tools and common scenarios based on typical clusterware issues are presented there together with solving tips.

The next chapters (6,7,8 and 9) describe installation of database as well as new features of an Oracle 11g Release 1 and Release 2. In subsequent paragraph we can read about Oracle RAC High Availability features like - Load Balancing, Transparent Application Failover and Fast Connection Failover. After that backup and recovery strategy for database and whole clusterware comes to play. From management perspective this part is very important and should be study carefully. Next part is a overview of performance tuning tools, wait event and statistics important any performance improvement work.

Real life examples is in my opinion most interesting part of the book. All readers can go though typical scenario like adding or deleting node from cluster or converting database from single instance to RAC database. Configuration of Oracle EBS with Oracle RAC is also very well documented.

When I went though whole book I have to say that it was a good lecture and I can recommend it for everybody who want to do first dive into Oracle RAC world. Furthermore I think that more experience users will find this book interesting and worth to read too.

regards,
Marcin

Wednesday, August 24, 2011

Oracle VM 3.0

Announced yesterday Oracle VM 3.0 can be downloaded from edelivery.oracle.com.
I have downloaded Oracle VM server - 177 M and Oracle VM Manager - 2.4 GB. As far as I remember Oracle VM Manager is based on Oracle Database and Weblogic stack and this explain a little bit size of iso image with VM manager.
Documentation can found here -Oracle VM 3.0 documentation.
ps. Going to install it now

regards,
Marcin

Saturday, August 6, 2011

PL/SQL myths busted

RSS feed from Oakie's blog point me to Toon's Koppelaars post and his redirection to Morten Braten blog. Morten wrote excellent post about stored PL/SQL procedures - Mythbusters: Stored Procedures Edition.

regards,
Marcin