Oracle Performance and Backup Blog: 2010

Tuesday, November 2, 2010

Unsafe deinstall using Oracle Univeral Installer.

Are Oracle Homes totally independent from OUI point of view? This was my impression until yesterday when I hit Oracle OUI de-installer bug in Oracle Real Application Cluster environment. I did 3 simple steps:

I have installed new Oracle Home into new directory on both nodes
There was timing issue between servers so I need to remove new Oracle Home using OUI de-install functionality.
I have corrected timing issue and installed new Oracle Home once again.

All old homes have been used by working Oracle ClusterWare, ASM and database and I believed that there should not be any issues. Unfortunately a few minutes after that work I have got first error message but only for new session:

ORA-27504: IPC error creating OSD context 
ORA-27300: OS system dependent operation: IPC init failed with status:65 
ORA-27301: OS failure message: Package not installed 
ORA-27302: failure occurred at: skgxpcini 
ORA-27303: additional information: libskgxpd.so called

I was surprised and I have checked $ORACLE_HOME/lib/ directory and I have found root cause - libskgxpd.so has very recent modification date and it was date of my installation. I have checked all OUI logs immediately looking for that file and in deinstallation log I found that entry:

INFO: The output of this make operation is also available at: '/opt/app/oracle/product/10.2.0/Db_new/install/make.log'
INFO:
INFO: Start output from spawned process:
INFO: ----------------------------------
INFO:
INFO: rm -f /opt/app/oracle/product/10.2.0/Db_1/lib/libskgxp10.so
INFO: cp /opt/app/oracle/product/10.2.0/Db_1/lib//libskgxpd.so /opt/app/oracle/product/10.2.0/Db_1/lib/libskgxp10.so

But in that same log I have found following message at the end:

INFO: Current Inventory:
        Oracle Home: OraCrs10g_home
                This Oracle Home has not been changed in this session.
        Oracle Home: ASM_HOME
                This Oracle Home has not been changed in this session.
        Oracle Home: RDBMS_HOME
                This Oracle Home has not been changed in this session.
INFO: This deinstallation was successful

So OUI replaced library libskgxp10.so in existing old home (RDBMS_HOME) and at the end of work stated that this home has not been change. There are bugs related to that : 7006848, 5474623.

If you ever hit that error – there is two possible solutions:

Copy backup libskgxp10.so from untouched home if you have home with this same patch level (in my case it was ASM)
Rebuild Oracle using the following steps
- cd $ORACLE_HOME/rdbms/lib
  rename the original library (if exists)
  mv libskgxp10.so libskgxp10.so.old
- Relink to configure UDP for IPC
  make -f ins_rdbms.mk rac_on ipc_udp ioracle
- Check whether the library exists
  ls -l $ORACLE_HOME/lib/libskgxp10.so
- startup the instance

And good advise at the end – your testing environment should be same as production one. If you are going to do some work on RAC test it on your test RAC too.

Sunday, September 12, 2010

Oracle patchest 11.2.0.2

Hi,

Some people tweet about new Oracle patch set 11.2.0.2. I decided to check it too but I can't find it on MOS any more. Not only me as Tim Hall (@oraclebase) tweet about that situation too.

Anyway there is one important note on MOS - 1189783.1 - starting with patch set 11.2.0.2 new Oracle patch sets will be full installations of Oracle Software and patching out-of-place will be preferred one or even mandatory one in case of Grid Infrastructure.

I hope this note doesn't disappear from MOS like patch set and will be still valid after a few days.

Now I'm waiting for 11.2.0.2 to appear once again.

regards,

Marcin

Wednesday, September 8, 2010

New version of Simulating ASH

Hello,

I have mentioned some time ago about using Simulating ASH to solve performance issue in Oracle 9i database.

According to some restrictions I have to customize it a little bit and than I agreed with Kyle Hailey that I will publish it as a new version of S-ASH. A new project has been created on SourceForge to keep a repository and this new code and installation instruction can be find here.

Please feel free to post any comments or remarks. I have some idea about potential enhancement of that tool but I'm open for any new too.

regards,

Marcin

Wednesday, August 4, 2010

Column names and typing errors

Last few days I have hit myself into Oracle "feature" which was one of my favorite when I was helping developers solve strange Oracle issues. I recall one Oracle ANSI SQL syntax issue when Oracle didn't recognize that column names are duplicated in different tables and use a random one in output. This is why I always told people use column name with table name to avoid confusion. Anyway now I decided to blog about that and show why table.column_name syntax is so important.

Let's go through an example - In this syntax equal ?

select name1 from T1 where id1 in (select id1 from T2 where name2='NAME2');

select T1.name1 from T1 where T1.id1 in (select T2.id1 from T2 where name2='NAME2');

Looks OK isn't it ? But it really depend on how tables are defined.

SQL> desc T1
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID1                                                NUMBER
 NAME1                                              VARCHAR2(100)

SQL> desc T2
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID2                                                NUMBER
 NAME2                                              VARCHAR2(100)

SQL> select id1, name1 from T1;

       ID1 NAME1
---------- -------------------------------------------------------
         1 NAME1

SQL> select id2, name2 from T2;

       ID2 NAME2
---------- -------------------------------------------------------
         2 NAME2

Now we can see that there is a typo in second query as there is no ID1 column in T2 table. But lets try to execute both. First we start a query with tablename.columnname syntax :

SQL> select T1.name1 from T1 where T1.id1 in (select T2.id1 from T2 where name2='NAME2');
select T1.name1 from T1 where T1.id1 in (select T2.id1 from T2 where name2='NAME2')
                                                *
ERROR at line 1:
ORA-00904: "T2"."ID1": invalid identifier

Looks OK Oracle found our error and query has not been executed. What about first example ?

SQL> select name1 from T1 where id1 in (select id1 from T2 where name2='NAME2');

NAME1
--------------------------------------------------------------------------------
NAME1

SQL>

It return one row and Oracle didn't find our typing error. Even worse IN filter is true for every T1 row but it should not - if you check rows in T1 and T2 where is no common value between ID1 and ID2.
Why ? We are going to check if ID1 from T1 is included in set of ID1 from T2. But there is no ID1 in T2 so Oracle is using ID1 from T1 as in every correlated subquery. What this subquery return ? It will return ID1 from table T1 for every row in table T2 where name is equal to 'NAME2'. This is not what was excepted.

But is this subquery can be executed as separate query ? Of course not

SQL> select id1 from T2 where name2='NAME2';
select id1 from T2 where name2='NAME2'
       *
ERROR at line 1:
ORA-00904: "ID1": invalid identifier

I'm not sure if that problem can be classified as bug but at least is good to know how Oracle is using columns names and use tablename.columnname syntax to avoid confusion and increase code quality. Especially if there are tables with long column names and there is one letter difference between them.

regards,
Marcin

Thursday, July 29, 2010

Process hung on library cache lock

Customer reported a system slowdown/process hung during small and well defined period of time and it was related to one specific process only. Standard Oracle 9i performance analyze tool - Statspack didn't show anything unusual in that time but shapshots were taking very 30 min and this issue took about 5 minutes.

As we could not replicate issue I thought about SASH (a free simulator of Oracle Active Session History) and I have installed it and configured. From that time I was able to gather and keep information about active sessions.

When next time I got detailed information about similar issue I was able to use collected data and found a problem. After investigation I figured out that process used for that business functionality stuck on “library cache lock” event. When I compared my findings from SASH with Statspack I realize why I didn't spotted it before - "library cache lock" was about 1.8 % of all locks between statspack snapshots. From overall database review there was no problem but from process (and business) perspective that lock hung important process for almost 5 minutes.

Below I described all major steps of my investigation and what was a root cause.
Let's start with SASH output (I just left 3 lines but there was many more similar lines):

SQL> select sample_time,program, session_id, session_state, event, seq#, sql_id, '0' || trim(to_char(p1,'XXXXXXXXXXXXXXXXX')) "p1raw", p3
  2  from v$active_session_history
  3  where 
  4  sample_time >= to_date('18-7-2010 13:50:10','dd-mm-yyyy hh24:mi:ss')
  5  and sample_time <= to_date('18-7-2010 13:54:58','dd-mm-yyyy hh24:mi:ss')
  6  and event like 'library%'
  7  order by sample_time desc
  8  /

SAMPLE_TI PROGRAM        SESSION_ID SESSION_STATE EVENT              SEQ#  SQL_ID     p1raw             P3
--------- -------------- ---------- ------------- ------------------ ----- -------    -----------       ---
18-JUL-10 importantp.exe        472 WAITING       library cache lock  214  1548901218 07000002AC789A98  210
18-JUL-10 otherprogr.exe        436 WAITING       library cache lock    8  2855865705 07000002AC789A98  210
18-JUL-10                       375 WAITING       library cache lock   33   261624711 07000002AC789A98  310

First idea was about parsing problem but when I took a look on SQL it looked OK but it was just entry point to PL/SQL procedure. Next thing was to check event parameters and there was first surprise. According to Oracle documentation and Metalink note 34579.1 that event has following parameters:

p1 is an address of required object - in my case 07000002AC789A98
p3 is a lock type and namespace type (lock mode * 100 + namespace) - in my case 310 or 210

In Oracle definitions (metalink or decode in v$librarycache) namespace number 10 is not defined. See whole list below
•    0 SQL Area
•    1 Table / Procedure / Function / Package Header
•    2 Package Body
•    3 Trigger
•    4 Index
•    5 Cluster
•    6 Object
•    7 Pipe
•    13 Java Source
•    14 Java Resource
•    32 Java Data

So what object it is ? P1 had a address and X$KGLOB has a list of all objects (I checked that table next day but there was no restart and it was a chance that there is still that same object in shared pool as session was running 24/7):

SQL> select kglnaown "Owner", kglnaobj "Object" from x$kglob where kglhdadr='07000002AC789A98'; 

Owner  Object
------ --------
SOFT   SNDQ

What type of objects is it ?

SQL> select object_type, object_name from dba_objects where owner='SOFT' and object_name = 'SNDQ';

OBJECT_TYPE OBJECT_NAME
----------- -----------
QUEUE       SNDQ

Bingo - object was a queue there was a dequeue in that PL/SQL code. My process called importantp.exe was waiting for lock on queue. Let's check lock type:

2 - Share mode
3 - Exclusive mode

For importantp.exe lock mode in p3 parameter was 210 which mean wait for shared lock. But queuing and dequeueing operations are running with shared locks so reason for waiting had to be related to none of them. I found it in last line of my listing there was another processes waiting for exclusive lock on that same queue. Using SQL_ID (261624711) I discovered that there was another PL/SQL piece of code and add_subscriber function was called inside.

That was a explanation of importantp.exe hungs - as dequeue function was called after add_subscriber function it had to wait until DDL command was finished. Add subscriber function had to wait until all previous operations in shared mode will be completed and there was typical chain of locks.

There are bugs in Metalink raised by other people describing similar situations and there is only one workaround do not try any DDL on queues during traffic peaks. This recommendation is well know and most DBAs preventing database from typical DDL operation ( like index rebuild or schema changes) during business hours. The problems starts with hidden DDL commands used in application.

Sunday, July 18, 2010

OEM performance tab and Active Session History data.

Graphic representation of Active Session History data in Database Console or in Grid Control is causing some misunderstanding of how to read it. I have lot of questions about Y scale on OEM performance tab and now I decide to blog about it. I want to describe now OEM is using a ASH date and what we can do with this information. Anyway using SQL instead of OEM is more useful in performance tuning and can be easily focused on existing issue.

For other point of view knowledge of ASH give you opportunity to use free implementation of it developed by Kyle Hailey. (Thanks Kyle for your excellent scripts – this is great staff to learn from). By the way recently I have deployed SASH on big production database running on Oracle 9i and it help me to identify and address performance problems. As soon as I finish all these work I will blog about it too.

CPU utilization

CPU usage on performance tab is in green and in my case it has upper limit set to 2 – as I have two core CPU.

It is quite clear how to read CPU usage and what is a maximum CPU utilization. Scope of time which sessions can spend on CPU has an upper limit which is 100 % utilization of all CPU. It mean that maximum CPU time for session in sampling time can be calculated using following formula

Number of CPU x Sampling Time

So if we want to calculate a CPU usage we need to use following formula:

time consumed per all sessions / (Number of CPU x Sampling Time) * 100 %

ex.
time consumed per all sessions = 16 s
Number of CPU = 2
Sampling time = 15 s

Util of all CPUs = 15 / (2 * 15) * 100 % = 53.3 %

How to calculate CPU utilisation using Active Session History data instead of OEM performance tab ?

Active Session History table is a results of sampling V$SESSION view every second. That mean there is no “real” waiting or executing time in V$ACTIVE_SESSION_HISTORY but just samples. We are assuming that every state in ASH view (WAITING or ON CPU) took exactly 1 s. This is important as we don’t sum waiting time from ASH view but we have to count how many 1 s events were in observed time.
Be aware that DBA_HIST_ACTIVE_SESS_HISTORY has only 1/10 data of V$ACTIVE_SESSIN_HISTORY (only sample_id modulo 10 are copied into it) and each sample is now 10s long if we want to use it for our calculations.

Following query will calculate average CPU utilization for two minutes:

SQL> select trunc(count(*)/
  2  (min(p.value)*(max(sample_id) - min (sample_id)))*100,2) "CPU_UTIL",
  3  max(sample_id) - min (sample_id) "sampling time [s]",
  4  count(*) "time consumed per sessions [s]",
  5  min(p.value) * (max(sample_id) - min (sample_id)) "maximum possible time [s]"
  6  from v$active_session_history,
  7  (select value from v$parameter  where name = 'cpu_count') p
  8  where 1=1
  9  and sample_time >= to_date('18-07-2010 12:23:00','dd-mm-yyyy hh24:mi:ss')
 10  and sample_time <= to_date('18-07-2010 12:25:00','dd-mm-yyyy hh24:mi:ss')
 11  and session_state = 'ON CPU';

  CPU_UTIL sampling time [s] time consumed per sessions [s] maximum possible time [s]
---------- ----------------- ------------------------------ -------------------------
     29.23               118                             69                       236

But this doesn’t help us in understanding OEM graph. Refresh rate is set to 15 sec so we need to display data from that period split into 15 sec slices.

SQL> select trunc(to_char(sample_time,'SSSSS')/15) "15s_sample", trunc(count(*)/
  2  (min(p.value)*15)*100,2) "CPU_UTIL",
  3  count(*) "time consumed per sessions [s]"
  4  from v$active_session_history,
  5  (select value from v$parameter  where name = 'cpu_count') p
  6  where 1=1
  7  and sample_time >= to_date('18-07-2010 12:23:00','dd-mm-yyyy hh24:mi:ss')
  8  and sample_time <= to_date('18-07-2010 12:25:00','dd-mm-yyyy hh24:mi:ss')
  9  and session_state = 'ON CPU'
 10  group by trunc(to_char(sample_time,'SSSSS')/15)
 11  order by 1;

15s_sample   CPU_UTIL time consumed per sessions [s]
---------- ---------- ------------------------------
      2972       6.66                              2
      2973       6.66                              2
      2974      53.33                             16
      2975      23.33                              7
      2976      36.66                             11
      2977      26.66                              8
      2978      33.33                             10
      2979      43.33                             13

8 rows selected.

Now if you use any graphic tool to create a chart using 15s_sample column as X axis and "CPU util" column as Y axis you should get very similar picture to OEM chart.

Wait events

There is no easily calculated upper limit for waiting sessions. Why ? Number of waits per sample is depended on session state and number of waiting sessions. So what is a maximum for that ? Should we use only active sessions or all connected sessions ? The worst case scenario is all sessions are active and all are in waiting state. Both values multiplied each other give us a potential upper limit.

Performance tab in OEM is calculating a average of waits in sample time divided by groups and all groups are displayed using stacked area graph.

Following SQL return data used to create following graphs

SQL> select trunc(to_char(sample_time,'SSSSS')/15) "15s_sample", wait_class, count(*)/15
  2  from v$active_session_history
  3  where sample_time >= to_date('18-07-2010 12:23:00','dd-mm-yyyy hh24:mi:ss')
  4  and sample_time <= to_date('18-07-2010 12:25:00','dd-mm-yyyy hh24:mi:ss')
  5  and session_state = 'WAITING'
  6  group by trunc(to_char(sample_time,'SSSSS')/15), wait_class order by 1,3
  7  ;

15s_sample WAIT_CLASS                                                       COUNT(*)/15
---------- ---------------------------------------------------------------- -----------
      2974 Other                                                                     .2
      2974 System I/O                                                        .533333333
      2974 Commit                                                                     2
      2974 User I/O                                                          2.06666667
      2974 Concurrency                                                       4.66666667
      2975 Application                                                       .066666667
      2975 System I/O                                                        .933333333
      2975 User I/O                                                                 3.8
      2975 Commit                                                            4.93333333
      2976 Concurrency                                                       .066666667
      2976 Application                                                       .066666667
      2976 Other                                                                     .4
      2976 System I/O                                                               1.2
      2976 Commit                                                                   5.4
      2976 User I/O                                                                 5.8
      2977 System I/O                                                        .866666667
      2977 User I/O                                                          2.93333333
      2977 Commit                                                                   5.6
      2978 Concurrency                                                       .066666667
      2978 System I/O                                                        .933333333
      2978 User I/O                                                          3.26666667
      2978 Commit                                                                     4
      2979 System I/O                                                        .866666667
      2979 User I/O                                                          1.46666667
      2979 Commit                                                            2.53333333

25 rows selected.

As results we have average time spend by all sessions divided per wait class for a 1 second. What it mean that in sample 2976 wait class “Commit” average time was 5.4 ? It exactly mean that in that 15 seconds sampling time all active sessions were 5.4*15 = 81 times waiting for events in commit class. Is it bad or not ? This is a different question and will not be answered in that post.

Now if we take a look on sample nr 2976

2976 Application                                                       .066666667
2976 Other                                                                     .4
2976 System I/O                                                               1.2
2976 Commit                                                                   5.4
2976 User I/O                                                                 5.8

Sum of all waits give us 12.8 and this is more less what we can see as a top value on graph. A differences between graph and our calculation are related to observation period start time and how we split results into 15 sampling time. We are using a group by function based on sample_time truncated into 4 periods of seconds - using group by to_char function (based on Kyle idea).These periods are: 0 – 15, 15- 30 , 30-45, 45-60 but OEM can display data based on different 15 sec groups ex. 10 - 25, 25 - 40 so small differences can occure.

Now the question is – is this a bad thing to have high numbers of OEM graph ? And like always when we are talking about Oracle performance an answer is depends.

What is bad for sure if our CPU usage is going over 85-90 % and stay there for a long time. In that case CPU response time will have good opportunity to increase and whole system performance can go down. What to do with waits ? In my opinion a best way is to create a baseline (even on paper) with average waiting time and react only when system is working over our baseline or what is more important when system users are claiming a performance problems.

Next time I will use drill-into functionality of OEM and will try to explain what other graphs means – unfortunately most of them are changing a scale between main performance graph and detailed ones so if you are not aware of that it can cause a lot of misunderstandings.

Regards,
Marcin

Wednesday, July 14, 2010

Oracle and AIX

Hi,

Very useful document about AIX configuration for Oracle Database provided IBM.

regards,
Marcin

Sunday, June 20, 2010

Grid Infrastructure - Reconfiguration / host name change

In 11g Release 2 Oracle introduced a Grid Infrastructure (GI) which can be also called Oracle Restart if it is implemented for single instance only. If database is based on Oracle ASM there is no other option and GI has to be installed. Since version 11gR2 ASM is not longer part of RDBMS but part of Grid Infrastructure.

I would like to present how to perform reconfiguration of Grid Infrastructure on one server together with change of server name and IP. Of course reconfiguration can be done in case of other problems and I want to show that it isn't very complicated.

Part 1 - Gathering current configuration

Information about current configuration of ASM and LISTENER should be well documented so in case of reconfiguration it can be easily used.

[oracle@piorovm ~]$ srvctl config asm
ASM home: /u01/app/11.2.0/grid
ASM listener: LISTENER
Spfile: +DATA_SOURCE/ASM/ASMPARAMETERFILE/REGISTRY.253.722024685
ASM diskgroup discovery string: /dev/sd*

[oracle@piorovm ~]$ srvctl config listener
Name: LISTENER
Home: /u01/app/11.2.0/grid
End points: TCP:1521

In addition all ASM disk group names should be documented. If there is no information about group names this information could be read from ASM disks even when ASM instance is down.

In that example information from header of all disks matching string '/dev/sd*' will be displayed.

[oracle@piorovm ~]$ asmcmd
Connected to an idle instance.
ASMCMD> lsdsk -I -p /dev/sd*
Disk_Num  Disk_group   Header_Stat  Path
       0  DATA_SOURCE  MEMBER       /dev/sdb1
ASMCMD>

Part 2- Grid Control deconfiguration

Current Grid Infrastructure configuration has to be deleted. To achieve it roothas.pl script with proper attributes has to be executed. If Grid Infrastructure was working all resources will be stopped automatically before deletion. This script has to be run as root user.

[root@piorovm install]# cd /u01/app/11.2.0/grid/crs/install
[root@piorovm install]# ./roothas.pl -delete -force
2010-06-18 14:40:33: Checking for super user privileges
2010-06-18 14:40:33: User has super user privileges
2010-06-18 14:40:33: Parsing the host name
Using configuration parameter file: ./crsconfig_params
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Delete failed, or completed with errors.
CRS-4544: Unable to connect to OHAS
CRS-4000: Command Stop failed, or completed with errors.
ACFS-9200: Supported
2560+0 records in
2560+0 records out
10485760 bytes (10 MB) copied, 0.077012 seconds, 136 MB/s
Successfully deconfigured Oracle Restart stack

Part 3- Host rename and Grid Infrastructure configuration

In that part any required changes to host can be made. If reconfiguration was a result of errors
next new configuration of Grid Infrastructure can be done now.

In this example both IP and host name have been changed.

[oracle@piorovm2 ~]$ cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1       localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
10.10.30.150    piorovm2.localdomain piorovm2
[oracle@piorovm2 ~]$ hostname
piorovm2.localdomain

As a first step of Grid Infrastructure configuration "roothas.pl" script has to be executed.
It will create all local resources and allow future configuration of whole stack. This script has to be run as root user.

[root@piorovm install]# cd /u01/app/11.2.0/grid/crs/install
[root@piorovm2 install]# ./roothas.pl -verbose
2010-06-19 13:38:35: Checking for super user privileges
2010-06-19 13:38:35: User has super user privileges
2010-06-19 13:38:35: Parsing the host name
Using configuration parameter file: ./crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user 'oracle', privgrp 'oinstall'..
Operation successful.
CRS-4664: Node piorovm2 successfully pinned.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting

piorovm2     2010/06/19 13:41:12     /u01/app/11.2.0/grid/cdata/piorovm2/backup_20100619_134112.olr
Successfully configured Oracle Grid Infrastructure for a Standalone Server
[root@piorovm2 install]#

Next part of GI configuration will be executed from "oracle" account who is a owner of GI home.

First of all current status of all resources will be displayed.

[oracle@piorovm2 ~]$ crsctl status resource -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        OFFLINE OFFLINE
ora.diskmon
      1        OFFLINE OFFLINE
[oracle@piorovm2 ~]$

Both resources CSSD and DISKMON are offline and are not automatically started. It has to be changed and auto start mode has to be enabled.

[oracle@piorovm2 ~]$ crsctl modify resource "ora.cssd" -attr "AUTO_START=1"
[oracle@piorovm2 ~]$ crsctl modify resource "ora.diskmon" -attr "AUTO_START=1"
[oracle@piorovm2 ~]$ crsctl start resource "ora.cssd"
CRS-2672: Attempting to start 'ora.cssd' on 'piorovm2'
CRS-2672: Attempting to start 'ora.diskmon' on 'piorovm2'
CRS-2676: Start of 'ora.diskmon' on 'piorovm2' succeeded
CRS-2676: Start of 'ora.cssd' on 'piorovm2' succeeded
[oracle@piorovm2 ~]$ crsctl status resource -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        ONLINE  ONLINE       piorovm2
ora.diskmon
      1        ONLINE  ONLINE       piorovm2
[oracle@piorovm2 ~]$

In that step both cluster resources (in single node configuration both resources are not really cluster one) are started and other GI resources like Oracle Listener and Oracle ASM instance can be added.
In this example host name has been changed and appropriate changes has to be done for Oracle Listener configuration. After changes configuration file should look like this.

[oracle@piorovm2 ~]$ cat /u01/app/11.2.0/grid/network/admin/listener.ora
# listener.ora Network Configuration File: /u01/app/11.2.0/grid/network/admin/listener.ora
# Generated by Oracle configuration tools.

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = piorovm2.localdomain)(PORT = 1521))
    )
  )

ADR_BASE_LISTENER = /u01/app/oracle

Listener and Oracle ASM has to be added to Grid Infrastructure and can be managed by it

Listener name LISTENER has been taken from information gathered in first part.

[oracle@piorovm2 ~]$ srvctl add listener -l LISTENER
[oracle@piorovm2 ~]$ srvctl start listener
[oracle@piorovm2 ~]$ crsctl status resource -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       piorovm2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        ONLINE  ONLINE       piorovm2
ora.diskmon
      1        ONLINE  ONLINE       piorovm2

When Listener is up and running Oracle ASM instance can be added to GI.

Listener name, disk string path and path and name of ASM parameter file has been taken from part 1.

[oracle@piorovm2 ~]$ srvctl add asm -l LISTENER -d "/dev/sd*" -p "+DATA_SOURCE/ASM/ASMPARAMETERFILE/REGISTRY.253.722024685"
[oracle@piorovm2 ~]$ srvctl start asm
[oracle@piorovm2 ~]$ crsctl status resource -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA_SOURCE.dg
               ONLINE  ONLINE       piorovm2
ora.LISTENER.lsnr
               ONLINE  ONLINE       piorovm2
ora.asm
               ONLINE  ONLINE       piorovm2                 Started
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        ONLINE  ONLINE       piorovm2
ora.diskmon
      1        ONLINE  ONLINE       piorovm2
[oracle@piorovm2 ~]$

When both - Listener and Oracle ASM instance has been successfully added Grid Infrastructure configuration has been completed.

Oracle Grid Infrastructure and all management commands related in previous version for Oracle RAC only are now required for single instance environment with ASM and in my opinion this is a new direction of Oracle software. All differences between Oracle RAC and single instance will be minimized from maintenance and configuration point of view.

regards,

Marcin

Monday, May 24, 2010

Rename file and BAAG

Hi,

There are two reasons why I'm posting this. First I have read a interview
with Alex Gorbachev about BAAG and second I run into BAAG related problem myself.

This was not pure BAAG problem as I used my knowledge but I forgot about one very important step - VERIFY old knowledge (based on Oracle 8i) with current release. What I did ?

A scenario was very simple - I have a "production" database (thanks God it was only my VM) running on ASM with all files in group called DATA. I cloned that database using a storage snapshot functionality and then I have renamed ASM group to different name and mounted it. So far so good. Next step was to mount a cloned database and rename a files in control file using old fashion command

ALTER DATABASE RENAME FILE '+DATA\xxxx\redo01.log' to '+DATACLONE\xxxx\redo02.log';

Command was successfully finished and control file has been updated. I have checked a alert log and what was my surprise when I saw warning that file source can't be deleted.

WARNING: Cannot delete Oracle managed file

I didn't want to delete a source file - this file is a part of source database. It was very interested and I did another test - I have stopped a source database and run rename command for other file.

And ? Yes a source file has been deleted. What if you do it on production environment without test ?

The most worry thing is that latest SQL reference there is no information about that behavior.

This clause renames only files in the control file. It does not actually rename them on your operating system. The operating system files continue to exist, but Oracle Database no longer uses them.

This is not about ASM but if there is a different behavior for ASM it should be mentioned as well.

So there are three important things to remember:
- ALTER DATABASE RENAME FILE can delete files

- always check what you are going to do and not guess a results

- check relevant of vendor docs, White papers and other well know stuff with current release of software.

regards,

Marcin

Tuesday, May 11, 2010

Persistent ISCSI disks name

Last few days I was working on Oracle cloning possibility using storage snapshots (post will be blogged soon) and I hit a ISCSI issue at the very beginning.
I have been playing with restarting storage and Linux box and from time to time ISCSI disk has been mapped to different block devices (/dev/sdb1 instead of /dev/sdc1). I really need to keep this persistent and I found this instruction for RedHat Linux (so it should work for OEL and CentOS).

Additionally I was looking for possibility to check how IQN shared on storage is mapped to Linux block devices. There is a nice directory layout where you can check that.

[root@piorovm by-path]# pwd
/dev/disk/by-path
[root@piorovm by-path]# ls -l
total 0
lrwxrwxrwx 1 root root  9 May 11 13:01 ip-10.10.20.241:3260-iscsi-iqn.1986-03.com.sun:02:6fe1e73d-8d7a-6ee7-8446-db844ae0d0c7-lun-0 -> ../../sdc
lrwxrwxrwx 1 root root 10 May 11 13:01 ip-10.10.20.241:3260-iscsi-iqn.1986-03.com.sun:02:6fe1e73d-8d7a-6ee7-8446-db844ae0d0c7-lun-0-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  9 May 11 13:01 ip-10.10.20.241:3260-iscsi-iqn.1986-03.com.sun:02:f6746dbb-c95f-e2dc-8bb6-c82d3ef5169a-lun-0 -> ../../sdb
lrwxrwxrwx 1 root root 10 May 11 13:01 ip-10.10.20.241:3260-iscsi-iqn.1986-03.com.sun:02:f6746dbb-c95f-e2dc-8bb6-c82d3ef5169a-lun-0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  9 May 11 13:00 pci-0000:00:07.1-ide-0:0 -> ../../hda
lrwxrwxrwx 1 root root 10 May 11 13:00 pci-0000:00:07.1-ide-0:0-part1 -> ../../hda1
lrwxrwxrwx 1 root root 10 May 11 13:00 pci-0000:00:07.1-ide-0:0-part2 -> ../../hda2
lrwxrwxrwx 1 root root  9 May 11 13:00 pci-0000:00:07.1-ide-1:0 -> ../../hdc
lrwxrwxrwx 1 root root  9 May 11 13:00 pci-0000:00:10.0-scsi-0:0:0:0 -> ../../sda
lrwxrwxrwx 1 root root 10 May 11 13:00 pci-0000:00:10.0-scsi-0:0:0:0-part1 -> ../../sda1
[root@piorovm by-path]#

When you know which IQN is pointed to which block devices we can use mentioned RedHat instruction with some additional remarks.

Here is output from my configuration

Check WWID for block devices

[root@piorovm ~]# scsi_id -g -s /block/sdb
3600144f04be80ec000000c2936224200
[root@piorovm ~]# scsi_id -g -s /block/sdc
3600144f04be7ffdf00000c2936224200

Go to

[root@piorovm ~]# cd /etc/udev/rules.d/

and create a file called 20-names.rules Replace my WWID with results from scsi_id command and put correct names

[root@piorovm rules.d]# cat 20-name.rules
KERNEL=="sd*", BUS=="scsi", PROGRAM="/sbin/scsi_id -g -s", RESULT=="3600144f04be7ffdf00000c2936224200", NAME="sdc"
KERNEL=="sd*", BUS=="scsi", PROGRAM="/sbin/scsi_id -g -s", RESULT=="3600144f04be80ec000000c2936224200", NAME="sdb"

Here is a change with RedHat doc – use double equal mark "==" instead of one equal mark "=" for compare (KERNEL, BUS,RESULT).

Save file and start

[root@piorovm ~]# start_udev
Starting udev:                                             [  OK  ]
[root@piorovm ~]#

regards,
Marcin

Friday, April 23, 2010

Grid Control 11.1- new directories structure

New Grid Control has a different design of directory structure on disk.
During installation process you have been asked for two paths

Middleware home
Instance base

Let’s take a closer look on new structure.

Middleware Home - set to /u01/Middleware

This is directory where Weblogic server has been installed and according to Grid Control Installation documentation

Oracle Middleware Home is the parent directory that has the Oracle WebLogic Server
home, the Web tier instance files, and, optionally, one or more Oracle homes.
This is where the OMS and the Management Agent are deployed.

After Grid Control installation process you will see following structure in that directory:

[oracle@piorovm Middleware]$ du -s -h *
1.4G    agent11g      
4.0K    domain-registry.xml   
172M    jdk160_14_R27.6.5-32  
187M    jrockit_160_14_R27.6.5-32  
8.0K    logs       
121M    modules      
4.0K    ocm.rsp      
3.4G    oms11g       
895M    oracle_common     
887M    Oracle_WT      
60K     registry.dat     
4.0K    registry.xml     
444K    user_projects     
26M     utils       
518M    wlserver_10.3

As you can see the following directories are part of Grid Control:

agent11g - Agent Oracle Home
oms11g - OMS Oracle Home
oracle_common
Oracle_WT - WebTier

Rest of directories are part of Weblogic server.

Instance base - set to /u01/gc_inst

According to Oracle Installation documentation:

Oracle Management Service Instance Base Location is the directory under the parent directory of the Oracle Middleware Home, where the configuration files of OMS a restored.

Let see what is inside

[oracle@piorovm gc_inst]$ du -s -h *
1.4M    em        
895M    user_projects     
53M     WebTierIH1

Short description:

em - sysman configuration and logs files – equivalnet of old OMS ORACLE_HOME/sysman directory
user_projects – deployed OMS server application – equivalent of old OMS ORACLE_HOME/j2ee directory
WebTierIH1 – web tier - equivalent of old OMS ORACLE_HOME/Apache

Log files

As Grid Control has a new directory structure all log files has been moved to different directories too. Please find some new location of well know logs:

Agent trace and log files -
/u01/Middleware/agent11g/sysman/log/
OMS trace and log files -
/u01/gc_inst/em/EMGC_OMS1/sysman/log
Domian log -
/u01/gc_inst/user_projects/domains/GCDomain/servers/EMGC_ADMINSERVER/logs/GCDomain.log
Weblogic OMS server --
/u01/gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1/logs
Weblogic Admin server -
/u01/gc_inst/user_projects/domains/GCDomain/servers/EMGC_ADMINSERVER/logs

Where:

/u01/gc_inst - is a Instance Base
GCDomain is a Weblogic domain name

As you can see there are two Weblogic servers – if you are familiar with Weblogic you know that there should be always at least two – one is a Admin server and next servers (can be more than one) are management servers and those servers are running applications. In our case server EMGC_OMS1 is a management server and OMS software is running on it.
In relation to old Oracle Grid Control we can say that new admin server is a equivalent of HOME OC4 container and new management server called EMGS_OMS1 is a equivalent of EM OC4J container.

Grid Control maintenance

Automatic start:

An additional file has been added to Linux starting scripts (/ect/init.d/gcstartup) and it is staring
Grid Control infrastructure. Keep in mind that unlike to Grid Control 10g this script doesn’t include database. It is DBA responsibility to start a database before this script will be started.

Manual start:
To start Grid Control we need to use emctl tool from oms11g Oracle Home.

[oracle@piorovm oms11g]$ ./bin/emctl start oms
Oracle Enterprise Manager 11g Release 1 Grid Control
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
Starting WebTier...
WebTier Successfully Started
Starting Oracle Management Server...
Oracle Management Server Successfully Started
Oracle Management Server is Up
[oracle@piorovm oms11g]$

Status:
From maintenance point of view OPMN functionality has been reduced to WebTier only and management server can be start and stop only via emctl utility.

[oracle@piorovm oms11g]$ ./bin/emctl status oms
Oracle Enterprise Manager 11g Release 1 Grid Control
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
WebTier is Up
Oracle Management Server is Up
[oracle@piorovm oms11g]$

WebTier
Let’s try to check HTTP status using a old OPMN process

[oracle@piorovm ]$ cd /u01/Middleware/oms11g
[oracle@piorovm oms11g]$ ./opmn/bin/opmnctl status
opmnctl: ORACLE_INSTANCE is not set.
opmnctl status requires that ORACLE_INSTANCE be correctly set in your environment.
Make sure you are using ORACLE_INSTANCE/bin/opmnctl, or set ORACLE_INSTANCE
in your environment.

Ok let’s set this environment variable and check again.

[oracle@piorovm oms11g]$ ./opmn/bin/opmnctl status

Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status
---------------------------------+--------------------+---------+---------
ohs1                             | OHS                |   11538 | Alive

Now I can try to restart WebTier process if needed

[oracle@piorovm oms11g]$ ./opmn/bin/opmnctl stopproc
opmnctl stopproc: stopping opmn managed processes...
[oracle@piorovm oms11g]$ ./opmn/bin/opmnctl status

Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status
---------------------------------+--------------------+---------+---------
ohs1                             | OHS                |     N/A | Down

[oracle@piorovm oms11g]$ ./opmn/bin/opmnctl startproc
opmnctl startproc: starting opmn managed processes...
[oracle@piorovm oms11g]$ ./opmn/bin/opmnctl status

Processes in Instance: instance1
---------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid | status
---------------------------------+--------------------+---------+---------
ohs1                             | OHS                |   18227 | Alive

Agent

There is no changes to agent startup process - only Agent Oracle Home has a different location.

[oracle@piorovm agent11g]$ pwd
/u01/Middleware/agent11g
[oracle@piorovm agent11g]$ ./bin/emctl start agent
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
Starting agent ................................... started.
[oracle@piorovm agent11g]$

Saturday, April 17, 2010

Grid Control 11.1- Installation - OEM

Hello,

Welcome in part 2 of my series about Grid Control 11g.
In the first part I have described how to install Weblogic server which is a one of main components of new Grid Control.

Second required component is a database in releases described in previous post too.I have used 11.1.0.7 with default settings.
According to installation document (included in zip files from eDevlivery) and checks performed by OUI during my installation following parameters has to be set in my database:

open_cursors 300 or greater
processes 500 or greater
session_cached_cursors 200 or greater
log_buffer 10485760 or greater

Next thing is to delete existing database console repository and delete whole db console configuration. This is checked by OUI too so it is better to do it before start.

emca -deconfig dbcontrol db -repos drop

If you are installing Grid Control on RedHat 5.2 (it was my case) in addition to all packages required for 11gR1 database you will need following package:

compat-db-4.2 

I have used that rpm compat-db-4.2.52-5.1.i386.rpm This a package with libdb used by webserver.

Space requirements:

Weblogic - installed in previous part about 1 GB
Grid Control - required space is 8.4 GB

about 7 GB is installed in Middleware home
rest in OEM home

Database - about 1 GB for new tablespaces

After that we can start with installation.

OUI start - ./runInstaller
Confirm that you don't need MOS - at least during installation
Basic installation - without any additional patches
New installation - we don't have any existing Management Services yet
All prerequisites are met
Middleware home loction has to be set to Weblogic installation, OMS location is a directory for Grid Control - see disk spaces requirement above
Provide username and password for new Weblogic domain and password for Node Manager. Weblogic domain can be compared to OC4J but you can have more then one server in domain. Node Manager is a host based tool used for domain management.
Provide database details like hostname, listener port and sys password. OUI will check if database meet all required prerequisites now.

I have deleted DB Console but still got following error - after 2 compilations I just clicked continue. When you drop DB Console repository SYSMAN user should be dropped too - so I don't think it will be a problem.
Enter password for SYSMAN user and path for tablespace files
Registration password for agent and security settings
Ports summary - all ports in one page and easy to edit
Last review
Installation - cafe time ;) it took about 1 hour on my laptop
Root part of work - all in one script
And at least - OEM configuration - another long running process
This is it - yes if you are here you have Grid Control installed

This is all for that post. I have almost no time after installation to play with new Grid Control but I have a free weekend so I hope I will spend some time on investigating it and I hope to blog about it.

regards,
Marcin

Thursday, April 15, 2010

Grid Control 11.1- Installation - Weblogic server

Hello

New Grid Control 11g has been released a view days ago (see my previous post with links).I have spend some time downloading it and now is time to share my installing experience.

First of all you have to know that unlike in previous version of Grid Control, new one is shipped without application server and without database.Both of them has to be downloaded and installed separately.

This a list of all necessary Oracle products:

Oracle Database - 10.2.0.4, 11.1.0.7, 11.2.0.1 plus required patches - see Grid Control Basic Installation Guide
Weblogic 10.3.2 with patches - I was unable to find required patch but decided to try without it
certified operating system - Oracle Enterprise Linux 4, 5.x or RedHat Linux 4, 5.x

I'm wondering what is inside this package Grid Control 10g has about 2.5 GB including database and application server. Grid Control 11g has 4.2 GB without application server (1 GB) and database server (4 GB for Oracle 11gR1).
Let's start with prerequisites.I assume that everybody has database up and running and I will start with Weblogic server. I have some experience with Weblogic version 7 and 8 and I have been very excited to see a new version of it.

Download a Weblogic server - I have used Package installer which has everything inside one package
Login as oracle and start ./wls1032_linux32.bin process
Welcome screen appear - Click Next
Choose Middleware Home and click Next
Confirm that you don't need to configure My Oracle Support
Choose installation type
Confirm Weblogic directory
Installation summary - click Next
Take some - installation in progress
Installation has been completed - We don't need to configure any Weblogic domain now so we can skip Quickstart.

If everything was well you should have all necessary components (database and Weblogic server) to start Grid Control installation. Next post will cover check of database parameters and Grid Control installation. See you soon.

Marcin

Monday, April 12, 2010

Grid Control 11.1

Hi,

Grid Control 11.1 on Linux platform is available to download from Oracle E-delivery.

Thanks to Marco Gralike and Rob Zoeteweij.

There are 3 disks - about 4.2 GB in total - so be aware before you download it. You will need a lot of space for test. Grid Control 11.1 is based on Weblogic server and it need 2 GB of RAM.

According to disk space and memory requirements it is laptop killer.

regards,

Marcin

Friday, April 2, 2010

Shutdown immediate and background sqlplus process

Hello,

I have been playing with Oracle 11g and memory utilization and I finished myself with strange problem. I have started about 60 background sessions using following command:

$ sqlplus pioro/pioro @a.sql

Script a.sql is a simple one and it is allocating about 8 MB of memory for PL/SQL table but this is story for different post.
After some tests I was going to restart database. First thought was simple - shutdown immediate and that's it.
When I come back with next cup of tea database was still open. Hmmm.
Let's check alter.log

License high water mark = 59
Waiting for dispatcher 'D000' to shutdown
Waiting for shared server 'S000' to die
All dispatchers and shared servers shutdown
Tue Mar 30 09:31:00 2010
SHUTDOWN: Active processes prevent shutdown operation
Tue Mar 30 09:36:01 2010
SHUTDOWN: Active processes prevent shutdown operation

and trace file

Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /oracle/app/product/11.1.0/db_1
...
*** 2010-03-30 09:25:58.372
*** SESSION ID:(170.15) 2010-03-30 09:25:58.372
*** CLIENT ID:() 2010-03-30 09:25:58.372
*** SERVICE NAME:(SYS$USERS) 2010-03-30 09:25:58.372
*** MODULE NAME:(sqlplus@piorovm.localdomain (TNS V1-V3)) 2010-03-30 09:25:58.372
*** ACTION NAME:() 2010-03-30 09:25:58.372
...
ksukia: Starting kill, force = 0
ksukia: killed 57 out of 57 processes.

*** 2010-03-30 09:26:03.421
ksukia: Starting kill, force = 0
ksukia: Attempt 1 to re-kill process OS PID=24040.
...
ksukia: Attempt 1 to re-kill process OS PID=23958.
ksukia: Attempt 1 to re-kill process OS PID=23954.
ksukia: Attempt 1 to re-kill process OS PID=23951.
ksukia: Attempt 1 to re-kill process OS PID=23949.
...

Hmm - all this session was idle and there was no transactions at all - so why "shutdown immediate" can't shut down database. Next what has to be done is check of status of processes in Linux

$ ps -ef 
...
oracle    7426 11402  0 09:34 pts/1    00:00:00 sqlplus             @a.sql
oracle    7427  7426  0 09:34 ?        00:00:00 [oracle] <defunct>
...

I just put two lines as example - as you can see oracle process 7427 becomes zombie.
So let's kill it and solve problem

$ kill -9 7427
$ ps -ef
...
oracle    7426 11402  0 09:34 pts/1    00:00:00 sqlplus             @a.sql
oracle    7427  7426  0 09:34 ?        00:00:00 [oracle] <defunct>
...

Still working. I google a little bit and found answer (http://www.linuxsa.org.au/tips/zombies.html)
This is true - you can kill dead process. What you have to do is kill his parent process.
PPID for my process is 7426 and this is a sqlplus process starting my script.

$ kill -9 7426

and database is going down. (of course I have to kill all sqlplus processes without that one with shutdown immediate command running)

Most funny thing is that is sqlplus process is running in foreground or when is started from cron there is no issue with other sessions
and shutdown immediate is working well.

Example:

$ ps -ef
...
oracle    8831 11402  0 09:40 pts/1    00:00:00 sqlplus             @a.sql
oracle    8832  8831  1 09:40 ?        00:00:00 oraclepioro (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
...

Shutdown immediate executed on other sqlplus session and following result in observed session

SQL> select * from dual;
select * from dual
*
ERROR at line 1:
ORA-03135: connection lost contact
Process ID: 8832
Session ID: 153 Serial number: 5

This is expected result of shutdown immediate command.

Where is a problem ?
If sqlplus is running as background process and it become idle (no active SQL) will be not finished until it will be killed or taken to foreground using fg.
Be aware if you want to shutdown database and you have some sqlplus'es in background – especially in this same Linux session.

regards,
Marcin

Saturday, March 27, 2010

Oracle Application Server issue

Strange error in Oracle Application Server 10g 10.1.3.3.

I have tried to start it using opmnctl startall and it finish with following error:

10/03/25 08:06:44 WARNING: OC4J will not send ONS ProcReadyPort messages to opmn for service: OC4JServiceInfo id: null protocol: jms hostname: nul
l port: 12602 description: null
10/03/25 08:06:44 null hostname was null
10/03/25 08:06:44 Oracle Containers for J2EE 10g (10.1.3.3.0)  initialized

First idea was that something is wrong with /etc/hosts file or IP has been changed but everything was OK. I have found solution on OTN forum and it works. You have to stop all iAS components and delete all files from following directories:

$ORACLE_HOME/j2ee/home/persistence
$ORACLE_HOME/j2ee/[other OC4J]/persistence

regards,

Marcin

Saturday, March 13, 2010

Oracle ACFS on CentOS

Hi,

This is short instruction how to install Oracle ACFS on CentOS. CentOS is free version of RedHat and it is binary compatible. It is not officially supported by Oracle but all solutions related for RedHat should work on CentOS too.

I have found Laurent blog post about it and I used it as a background of my investigation. When I have used Laurent's steps it was fine until I want to register ACFS in Oracle Cluster Registry. I couldn't do it as not all cluster types has been registered during Grid Infrastructure configuration process.

Here is a simple solution how to fix it for new installations (still working on solution for existing one).

Perform GUI part of Grid Control installation

Before executing root.sh on both nodes edit following file on both nodes : $ORACLE_HOME/lib/osds_acfslib.pm

line 299:    
else   {     
   # we don't know what this is     
   $supported = 1;     
   $vers = "EL5";   
}

Execute a root.sh on both nodes

Now ACFS will be installed and configured and all necessary tools will be copied.

Monday, February 1, 2010

RMAN and SBT_TAPE backup pieces - how to catalog it

Hi,

There has been a discussion on oracle-l mailing list about RMAN and backup pieces on tape which are non in recovery catalog anymore. A few people including me response that those backup pieces can be restored using dbms_backup_restore PL/SQL package if we only know a backup piece name.

Update 01/02/2010: Now is more solutions - I have started my investigation a few days ago.

I investigated this problem a little deeper and I found another solution. I did it because I was not aware that according to official Oracle documentation CATALOG command is not working with ‘SBT_TAPE’ interface.

I have tested this solution on:

Oracle 11.1.0.7 Linux 32 bit
Oracle 10.2.0.1 Windows 32 bit
Oracle 10.2.0.4 Linux 32 bit

I’ve no possibility to check this on other machines now but if you can give me a shout what a results are.

Let’s story begin.

Only thing I knew was a backup piece name. This is a prerequisite but it can be found quite easily by Media Manager admin or by you itself if you know your Media Manager software. See example below for Legato Networker:

[oracle@piorovm usr]$ ./sbin/mminfo -q "client=piorovm" -r "ssid,name,totalsize,savetime,volume"
 ssid      name                                 total  date   volume
4267926146 RMAN:3bl4n6ru_1_1                  9961944 01/29/2010 piorovm.localdomain.001
4234372063 RMAN:3dl4n76v_1_1                805331112 01/29/2010 piorovm.localdomain.001
4217594914 RMAN:3el4n792_1_1               1389143616 01/29/2010 piorovm.localdomain.001
4200817814 RMAN:3fl4n7cl_1_1                477378960 01/29/2010 piorovm.localdomain.001
4251148936 RMAN:c-3537902502-20100129-03     10224108 01/29/2010 piorovm.localdomain.001
4184040645 RMAN:c-3537902502-20100129-04     10224108 01/29/2010 piorovm.localdomain.001

Or another example for Verita Netbackup (by Allen Brandon from Oracle-l list)

 /opt/oracle ->bplist -t 4 -k  -l -s 1/25/2010 -e 1/26/2010 -R /
-rw------- oracle    oracle       12582912 Jan 25 21:21 /c-18889999-20100125-09
-rw------- oracle    oracle        2359296 Jan 25 21:18 /xxprd_al_52855_1_709244222
-rw------- oracle    oracle       3383296K Jan 25 21:15 /xxprd_db_52852_1_709243545

Next thing is to add these names to control file or recovery catalog. First of all I started with very low level solution and dbms_backup_restore package. There is a function called inspectBackupPiece which required only backup piece name. Channel has to be allocated before that command. This gave me a hope that I can do it. I have created a small PL/SQL script and I have executed it.

DECLARE
v_dev varchar2(50);             
v_fullname varchar2(4000);
recid number;
stamp number;

BEGIN
v_dev:=sys.dbms_backup_restore.deviceAllocate(type=>'sbt_tape', ident=>'test');
sys.dbms_backup_restore.inspectBackupPiece('3fl4n7cl_1_1',v_fullname,recid,stamp);
sys.dbms_backup_restore.deviceDeallocate;
END;
/

And that was it – Backup piece has been added to control file.

Ok, if there is a possibility to does this using PL/SQL package there should be possibility do this using RMAN command line. I was trying with allocating ‘SBT_TAPE’ channel and catalog command but without success. Because I have very short memory about command syntax and I know that RMAN is helping with that a little bit I gave a try:

RMAN> catalog  ;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00558: error encountered while parsing input commands
RMAN-01009: syntax error: found ";": expecting one of: "archivelog, backup, backuppiece, controlfilecopy, clone, datafilecopy, device, db_recovery_file_dest, recovery, start"
RMAN-01007: at line 2 column 1 file: standard input

As you can see in error message is a short information about what is expected. Hmmm a device I think it is a good direction. Let’s try again

RMAN> catalog device type;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00558: error encountered while parsing input commands
RMAN-01009: syntax error: found ";": expecting one of: "double-quoted-string, disk, equal, identifier, single-quoted-string"
RMAN-01007: at line 1 column 20 file: standard input

Ok that’s it – RMAN is expecting a disk or any double or single quoted string.

Final step is to give RMAN a chance:

RMAN> list backupset;

RMAN>

There is no backup in control file.

RMAN> catalog device type 'sbt_tape' backuppiece '3dl4n76v_1_1';

released channel: ORA_DISK_1
released channel: ORA_SBT_TAPE_1
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: SID=170 device type=SBT_TAPE
channel ORA_SBT_TAPE_1: NMO v5.0.0.0
cataloged backup piece
backup piece handle=3dl4n76v_1_1 RECID=103 STAMP=709599103

and now

RMAN> list backupset;
List of Backup Sets
===================


BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
77      Full    Unknown    SBT_TAPE    00:00:00     29-JAN-10
        BP Key: 103   Status: AVAILABLE  Compressed: NO  Tag: TAG20100129T224031
        Handle: 3dl4n76v_1_1   Media:
  List of Datafiles in backup set 77
  File LV Type Ckp SCN    Ckp Time  Name
  ---- -- ---- ---------- --------- ----
  1       Full 8788185    29-JAN-10 +DATA/pioro/datafile/system.260.695313067

Yes we have one already added via catalog command.

After that I have found two notes on MOS where this syntax is mentioned no 727655.1 and not public no550082.1

This example is very typical for a lot of new solutions – if you are not aware that something is impossible you can try and do that.

regards,
Marcin

Saturday, January 30, 2010

Oracle RAC One Node – part 3

At the beginning I have to apologize everyone who was waiting on that post and I will try to keep posting more regular.

This time before I will present connection test results I need to clarity one thing.
Instance name in One Node RAC can change on each node. This is related to way how instance is started. In case of Omotion migration there is a situation when both instances are up and running so both instances have to have different name, ex.

testone_1 is running on rac1
testone_2 is running on rac2

What happen if one of servers with crash with running instance (ex. rac1 will crashed).
ClusterWare status is showing that last running instance was testone_1 and this instance will be migrated to other node in cluster. This mean that testone_1 instance will be started on rac2 server.

If we fix rac1 server and we want to failback our database using Omotion instance will be started on rac1 server but according to restriction described in previous point instance name has to be different and in that case instance name will be testone_2. So that mean that now we have instance:

testone_1 running on rac2
testone_2 running on rac1

Of course after session migration only one instance will be running but still with that same name.
Where is a problem? You can’t hardcoded instance name and server name in any monitoring tool because instance name can be vary depend on fail over scenario. You should use raconestatus to figure out what is a current status of database.

Now is a time to make a last tests with Java and Fast Connection Failover client.
I have used following code based on Oracle Example with FCF functionality.

After compilation it is high time to start tests:

Case no 1– both servers are up, instance is running on rac1

I have started Oracle test program and run Omotion in rac1 forcing instance migration to rac2.
Below is output from test program:

Url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=rac2-vip)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=serviceone)))
Instance name: testone_1
FCF Activ(cache): 1
FCF Avail(cache): 9

Url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=rac2-vip)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=serviceone)))
java.sql.SQLException: Closed Connection
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:111)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:145)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:207)
        at oracle.jdbc.driver.OracleStatement.ensureOpen(OracleStatement.java:3512)
        at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1244)
        at FCFDemo2.main(FCFDemo2.java:62)
getErrorCode=17008
getSQLState=null
getMessage=Closed Connection
Url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=rac2-vip)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=serviceone)))

Instance name: testone_2
FCF Activ(cache): 1
FCF Avail(cache): 0

As excepted session has been failover from instance to another without any problems with one raised exception.
So let’s test it once again

Url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=rac2-vip)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=serviceone)))
Instance name: testone_2
FCF Activ(cache): 1
FCF Avail(cache): 9

Url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=rac2-vip)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=serviceone)))
java.sql.SQLException: Io exception: Software caused connection abort: recv failed
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:111)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:145)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:254)
        at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:805)
        at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1030)
        at oracle.jdbc.driver.T4CStatement.executeMaybeDescribe(T4CStatement.java:829)
        at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1123)
        at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1263)
        at FCFDemo2.main(FCFDemo2.java:62)
getErrorCode=17002
getSQLState=null
getMessage=Io exception: Software caused connection abort: recv failed
java.sql.SQLException: Listener refused the connection with the following error:
ORA-12514, TNS:listener does not currently know of service requested in connect descriptor
The Connection descriptor used by the client was:
(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=rac2-vip)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=serviceone)))

        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:111)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:260)
        at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:386)
        at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:413)
        at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:164)
        at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:34)
        at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:752)
        at oracle.jdbc.pool.OracleDataSource.getPhysicalConnection(OracleDataSource.java:296)
        at oracle.jdbc.pool.OracleDataSource.getConnection(OracleDataSource.java:220)
        at oracle.jdbc.pool.OracleConnectionPoolDataSource.getPhysicalConnection(OracleConnectionPoolDataSource.java:156)
        at oracle.jdbc.pool.OracleConnectionPoolDataSource.getPooledConnection(OracleConnectionPoolDataSource.java:93)
        at oracle.jdbc.pool.OracleImplicitConnectionCache.makeCacheConnection(OracleImplicitConnectionCache.java:1529)
        at oracle.jdbc.pool.OracleImplicitConnectionCache.getCacheConnection(OracleImplicitConnectionCache.java:463)
        at oracle.jdbc.pool.OracleImplicitConnectionCache.getConnection(OracleImplicitConnectionCache.java:332)
        at oracle.jdbc.pool.OracleDataSource.getConnection(OracleDataSource.java:403)
        at oracle.jdbc.pool.OracleDataSource.getConnection(OracleDataSource.java:188)
        at oracle.jdbc.pool.OracleDataSource.getConnection(OracleDataSource.java:164)
        at FCFDemo2.main(FCFDemo2.java:77)

C:\Downloads\fcf_stepbystep>rem Restore old var values

C:\Downloads\fcf_stepbystep>set ORACLE_HOME=C:\oracle\product\10.2.0\db_1

Opps what happen ? We lost a connection and program has been terminated.
Let’s take a look on raised exceptions – first one is related to session closure on active instance. In code there is catch section which catch exception and try to reopen connection. But in that case a new started instance has not been registered in listener yet and another exception came in. This one was not catch by any section and demo has been terminated.
This same behaviour appear in SQL*Plus tests – in my opinion information about instance availability is sending before service is registered in listener. Maybe this is only my VM installation issue but if my assumption about order of event is correct it can happen everyware.

Case no 2 session with transactions

Base on Oracle example code I have added transactions. Below is my version of code:

// 
// last revised 01/06/2006 - Adding connection pool exposure
//      revised 01/05/2006 - kpm use VIP IP addresses.
//      revised 12/21/2005 - PC
// ...........
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.*;
import java.sql.Statement;
import java.util.Properties;
import java.util.*;
import oracle.jdbc.pool.OracleDataSource;
import oracle.jdbc.pool.*;
import oracle.jdbc.*;
public class FCFDemo2 {
   public static void main(String[] args) throws InterruptedException {
   try {
   OracleDataSource ods = new OracleDataSource();
   OracleConnectionCacheManager occm = null;
   ods.setUser("system");
   ods.setPassword("oracle");
   String cacheName = "cache";
   String dbURL="jdbc:oracle:thin:@"     
               +"(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=ON)" +
         "(ADDRESS=(PROTOCOL=TCP)" +
         "(HOST=rac1-vip)(PORT=1521))" +
         "(ADDRESS=(PROTOCOL=TCP)" +
         "(HOST=rac2-vip)(PORT=1521)))" +
         "(CONNECT_DATA=(SERVICE_NAME=serviceone)))";
   System.out.println("Url=" + dbURL );
   ods.setURL(dbURL);
   System.out.println("Url=" + dbURL );
   ods.setConnectionCachingEnabled(true);
   Properties prop = new Properties();
   prop.setProperty("MinLimit", "5");
   prop.setProperty("MaxLimit", "40");
   prop.setProperty("InitialLimit", "10");
   prop.put (oracle.net.ns.SQLnetDef.TCP_CONNTIMEOUT_STR,"" + (5000)); // 5000ms
   ods.setConnectionCacheProperties(prop);
   ods.setFastConnectionFailoverEnabled(true);
   ods.setONSConfiguration("nodes=rac1-vip:6200,rac2-vip:6200");
   ods.setConnectionCacheName(cacheName);   // moved from above
   occm = OracleConnectionCacheManager.getConnectionCacheManagerInstance();
   Connection conn = ods.getConnection();
   conn.setAutoCommit(false);
   System.out.println(conn);
   Statement stmt=conn.createStatement();  // fixed
   ResultSet rs = null;
   int stop = 0;
   
   rs =stmt.executeQuery("truncate table marcin");  
   int i = 0;
   int j=0;
   String name = new String();
      while (true){
      try {
                //System.out.println("Url=" + dbURL );
                rs =stmt.executeQuery("select instance_name from v$instance");
                while(rs.next()) {
 name = rs.getString(1);
                   System.out.println("Instance name: " + name);
                   System.out.println("FCF Activ(" + cacheName + "): " +
                   occm.getNumberOfActiveConnections(cacheName));
                   System.out.println("FCF Avail(" + cacheName + "): " +
                   occm.getNumberOfAvailableConnections(cacheName)+"\n");
                   PreparedStatement pstmt = conn.prepareStatement ("insert into marcin  values (?, ?)");
 pstmt.setInt (1, i);               
                   pstmt.setString (2, name);   
pstmt.execute ();
 System.out.println("inserted " + i +" \n");
 Thread.sleep(1000);
 conn.commit();    
 System.out.println("Commited " + i + " \n");
 i++;
      }
      catch (OracleSQLException sqle) {
     //sqle.printStackTrace();
              System.out.println("getErrorCode=" + sqle.getErrorCode());
              System.out.println("getSQLState=" + sqle.getSQLState());
              System.out.println("getMessage=" + sqle.getMessage());
     //Thread.sleep(1000);
     
     System.out.println("Try number " + j + "\n");
     j++;
     
     try {
     conn =ods.getConnection(); //Re-get the conn
     stmt =conn.createStatement();
     } catch (OracleSQLException sqle1) {
               System.out.println("getErrorCode - reconnect =" + sqle.getErrorCode());
               System.out.println("getSQLState - reconnect =" + sqle.getSQLState());
               System.out.println("getMessage - reconnect=" + sqle.getMessage());
      sqle1.printStackTrace();      
     }
        }
       Thread.sleep(1000);
    System.out.println("New iteration \n");    
       }
   }
   catch(Exception e)
      {
      e.printStackTrace();
      }
   }
}

Start inserting rows.

Instance name: testone_1
FCF Activ(cache): 1
FCF Avail(cache): 9

inserted 185
Commited 185

Instance name: testone_1
FCF Activ(cache): 1
FCF Avail(cache): 9

inserted 186
Commited 186

java.sql.SQLException: Closed Connection
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:111)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:145)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:207)
        at oracle.jdbc.driver.OracleStatement.ensureOpen(OracleStatement.java:3512)
        at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1244)
        at FCFDemo2.main(FCFDemo2.java:70)
getErrorCode=17008
getSQLState=null
getMessage=Closed Connection
Try number 0

Instance name: testone_2
FCF Activ(cache): 1
FCF Avail(cache): 0

inserted 187
Commited 187

Instance name: testone_2
FCF Activ(cache): 1
FCF Avail(cache): 0

inserted 188

Commited 188

Instance name: testone_2
FCF Activ(cache): 1
FCF Avail(cache): 0

As you can see transaction got a exception but because there is a loop in the code transaction has been replay and successfully inserted in database.

SQL> select id,cos from system.marcin where id > 184 order by id;

        ID   INSTANCE_NAME
---------- ------------------------------
       185 testone_1
       186 testone_1
       187 testone_2
       188 testone_2

Any RAC aware software should work with RAC OneNode without any problem like with typical Oracle RAC installation.

This is last part about Oracle RAC OneNode and connection details but I’m sure not a last about RAC itself.

Tuesday, November 2, 2010

Sunday, September 12, 2010

Wednesday, September 8, 2010

Wednesday, August 4, 2010

Thursday, July 29, 2010

Sunday, July 18, 2010

Wednesday, July 14, 2010

Sunday, June 20, 2010

Monday, May 24, 2010

Tuesday, May 11, 2010

Friday, April 23, 2010

Saturday, April 17, 2010

Thursday, April 15, 2010

Monday, April 12, 2010

Friday, April 2, 2010

Saturday, March 27, 2010

Saturday, March 13, 2010

Monday, February 1, 2010

Saturday, January 30, 2010

Subscribe Now: Feed Icon

Search This Blog

Author

Blog Archive

Label Cloud

My Blog List - Alphabetically ordered

Counter