Monday, February 1, 2010

RMAN and SBT_TAPE backup pieces - how to catalog it

Hi,
There has been a discussion on oracle-l mailing list about RMAN and backup pieces on tape which are non in recovery catalog anymore. A few people including me response that those backup pieces can be restored using dbms_backup_restore PL/SQL package if we only know a backup piece name.
Update 01/02/2010: Now is more solutions - I have started my investigation a few days ago.
I investigated this problem a little deeper and I found another solution. I did it because I was not aware that according to official Oracle documentation CATALOG command is not working with ‘SBT_TAPE’ interface.
I have tested this solution on:
  • Oracle 11.1.0.7 Linux 32 bit
  • Oracle 10.2.0.1 Windows 32 bit
  • Oracle 10.2.0.4 Linux 32 bit
I’ve no possibility to check this on other machines now but if you can give me a shout what a results are.

Let’s story begin.
Only thing I knew was a backup piece name. This is a prerequisite but it can be found quite easily by Media Manager admin or by you itself if you know your Media Manager software. See example below for Legato Networker:

[oracle@piorovm usr]$ ./sbin/mminfo -q "client=piorovm" -r "ssid,name,totalsize,savetime,volume"
 ssid      name                                 total  date   volume
4267926146 RMAN:3bl4n6ru_1_1                  9961944 01/29/2010 piorovm.localdomain.001
4234372063 RMAN:3dl4n76v_1_1                805331112 01/29/2010 piorovm.localdomain.001
4217594914 RMAN:3el4n792_1_1               1389143616 01/29/2010 piorovm.localdomain.001
4200817814 RMAN:3fl4n7cl_1_1                477378960 01/29/2010 piorovm.localdomain.001
4251148936 RMAN:c-3537902502-20100129-03     10224108 01/29/2010 piorovm.localdomain.001
4184040645 RMAN:c-3537902502-20100129-04     10224108 01/29/2010 piorovm.localdomain.001

Or another example for Verita Netbackup (by Allen Brandon from Oracle-l list)
 /opt/oracle ->bplist -t 4 -k  -l -s 1/25/2010 -e 1/26/2010 -R /
-rw------- oracle    oracle       12582912 Jan 25 21:21 /c-18889999-20100125-09
-rw------- oracle    oracle        2359296 Jan 25 21:18 /xxprd_al_52855_1_709244222
-rw------- oracle    oracle       3383296K Jan 25 21:15 /xxprd_db_52852_1_709243545
Next thing is to add these names to control file or recovery catalog. First of all I started with very low level solution and dbms_backup_restore package. There is a function called inspectBackupPiece which required only backup piece name. Channel has to be allocated before that command. This gave me a hope that I can do it. I have created a small PL/SQL script and I have executed it.

DECLARE
v_dev varchar2(50);             
v_fullname varchar2(4000);
recid number;
stamp number;

BEGIN
v_dev:=sys.dbms_backup_restore.deviceAllocate(type=>'sbt_tape', ident=>'test');
sys.dbms_backup_restore.inspectBackupPiece('3fl4n7cl_1_1',v_fullname,recid,stamp);
sys.dbms_backup_restore.deviceDeallocate;
END;
/

And that was it – Backup piece has been added to control file.
Ok, if there is a possibility to does this using PL/SQL package there should be possibility do this using RMAN command line. I was trying with allocating ‘SBT_TAPE’ channel and catalog command but without success. Because I have very short memory about command syntax and I know that RMAN is helping with that a little bit I gave a try:
RMAN> catalog  ;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00558: error encountered while parsing input commands
RMAN-01009: syntax error: found ";": expecting one of: "archivelog, backup, backuppiece, controlfilecopy, clone, datafilecopy, device, db_recovery_file_dest, recovery, start"
RMAN-01007: at line 2 column 1 file: standard input

As you can see in error message is a short information about what is expected. Hmmm a device I think it is a good direction. Let’s try again
RMAN> catalog device type;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00558: error encountered while parsing input commands
RMAN-01009: syntax error: found ";": expecting one of: "double-quoted-string, disk, equal, identifier, single-quoted-string"
RMAN-01007: at line 1 column 20 file: standard input
Ok that’s it – RMAN is expecting a disk or any double or single quoted string.
Final step is to give RMAN a chance:
RMAN> list backupset;

RMAN>

There is no backup in control file.
RMAN> catalog device type 'sbt_tape' backuppiece '3dl4n76v_1_1';

released channel: ORA_DISK_1
released channel: ORA_SBT_TAPE_1
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: SID=170 device type=SBT_TAPE
channel ORA_SBT_TAPE_1: NMO v5.0.0.0
cataloged backup piece
backup piece handle=3dl4n76v_1_1 RECID=103 STAMP=709599103 
and now

RMAN> list backupset;
List of Backup Sets
===================


BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
77      Full    Unknown    SBT_TAPE    00:00:00     29-JAN-10
        BP Key: 103   Status: AVAILABLE  Compressed: NO  Tag: TAG20100129T224031
        Handle: 3dl4n76v_1_1   Media:
  List of Datafiles in backup set 77
  File LV Type Ckp SCN    Ckp Time  Name
  ---- -- ---- ---------- --------- ----
  1       Full 8788185    29-JAN-10 +DATA/pioro/datafile/system.260.695313067 
 
Yes we have one already added via catalog command.
After that I have found two notes on MOS where this syntax is mentioned no 727655.1 and not public no550082.1

This example is very typical for a lot of new solutions – if you are not aware that something is impossible you can try and do that.

regards,
Marcin

Saturday, January 30, 2010

Oracle RAC One Node – part 3

At the beginning I have to apologize everyone who was waiting on that post and I will try to keep posting more regular.

This time before I will present connection test results I need to clarity one thing.
Instance name in One Node RAC can change on each node. This is related to way how instance is started. In case of Omotion migration there is a situation when both instances are up and running so both instances have to have different name, ex. 
  • testone_1 is running on rac1
  • testone_2 is running on rac2
What happen if one of servers with crash with running instance (ex. rac1 will crashed).
ClusterWare status is showing that last running instance was testone_1 and this instance will be migrated to other node in cluster. This mean that testone_1 instance will be started on rac2 server.

If we fix rac1 server and we want to failback our database using Omotion  instance will be started on rac1 server but according to restriction described in previous point instance name has to be different and in that case instance name will be testone_2. So that mean that now we have instance: 
  • testone_1 running on rac2 
  • testone_2 running on rac1

Of course after session migration only one instance will be running but still with that same name.
Where is a problem? You can’t hardcoded instance name and server name in any monitoring tool because instance name can be vary depend on fail over scenario. You should use raconestatus to figure out what is a current status of database.

Now is a time to make a last tests with Java and Fast Connection Failover client.
I have used following code based on Oracle Example with FCF functionality.

After compilation it is high time to start tests:

Case no 1– both servers are up, instance is running on rac1
I have started Oracle test program and run Omotion in rac1 forcing instance migration to rac2.
Below is output from test program:

Url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=rac2-vip)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=serviceone)))
Instance name: testone_1
FCF Activ(cache): 1
FCF Avail(cache): 9

Url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=rac2-vip)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=serviceone)))
java.sql.SQLException: Closed Connection
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:111)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:145)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:207)
        at oracle.jdbc.driver.OracleStatement.ensureOpen(OracleStatement.java:3512)
        at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1244)
        at FCFDemo2.main(FCFDemo2.java:62)
getErrorCode=17008
getSQLState=null
getMessage=Closed Connection
Url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=rac2-vip)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=serviceone)))

Instance name: testone_2
FCF Activ(cache): 1
FCF Avail(cache): 0

As excepted session has been failover from instance to another without any problems with one raised exception.
So let’s test it once again

Url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=rac2-vip)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=serviceone)))
Instance name: testone_2
FCF Activ(cache): 1
FCF Avail(cache): 9

Url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=rac2-vip)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=serviceone)))
java.sql.SQLException: Io exception: Software caused connection abort: recv failed
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:111)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:145)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:254)
        at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:805)
        at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1030)
        at oracle.jdbc.driver.T4CStatement.executeMaybeDescribe(T4CStatement.java:829)
        at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1123)
        at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1263)
        at FCFDemo2.main(FCFDemo2.java:62)
getErrorCode=17002
getSQLState=null
getMessage=Io exception: Software caused connection abort: recv failed
java.sql.SQLException: Listener refused the connection with the following error:
ORA-12514, TNS:listener does not currently know of service requested in connect descriptor
The Connection descriptor used by the client was:
(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=rac2-vip)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=serviceone)))

        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:111)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:260)
        at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:386)
        at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:413)
        at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:164)
        at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:34)
        at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:752)
        at oracle.jdbc.pool.OracleDataSource.getPhysicalConnection(OracleDataSource.java:296)
        at oracle.jdbc.pool.OracleDataSource.getConnection(OracleDataSource.java:220)
        at oracle.jdbc.pool.OracleConnectionPoolDataSource.getPhysicalConnection(OracleConnectionPoolDataSource.java:156)
        at oracle.jdbc.pool.OracleConnectionPoolDataSource.getPooledConnection(OracleConnectionPoolDataSource.java:93)
        at oracle.jdbc.pool.OracleImplicitConnectionCache.makeCacheConnection(OracleImplicitConnectionCache.java:1529)
        at oracle.jdbc.pool.OracleImplicitConnectionCache.getCacheConnection(OracleImplicitConnectionCache.java:463)
        at oracle.jdbc.pool.OracleImplicitConnectionCache.getConnection(OracleImplicitConnectionCache.java:332)
        at oracle.jdbc.pool.OracleDataSource.getConnection(OracleDataSource.java:403)
        at oracle.jdbc.pool.OracleDataSource.getConnection(OracleDataSource.java:188)
        at oracle.jdbc.pool.OracleDataSource.getConnection(OracleDataSource.java:164)
        at FCFDemo2.main(FCFDemo2.java:77)

C:\Downloads\fcf_stepbystep>rem Restore old var values

C:\Downloads\fcf_stepbystep>set ORACLE_HOME=C:\oracle\product\10.2.0\db_1

Opps what happen ? We lost a connection and program has been terminated.
Let’s take a look on raised exceptions – first one is related to session closure on active instance. In code there is catch section which catch exception and try to reopen connection. But in that case a new started instance has not been registered in listener yet and another exception came in. This one was not catch by any section and demo has been terminated.
This same behaviour appear in SQL*Plus tests – in my opinion information about instance availability is sending before service is registered in listener. Maybe this is only my VM installation issue but if my assumption about order of event is correct it can happen everyware.

Case no 2 session with transactions


Base on Oracle example code I have added transactions. Below is my version of code:

// 
// last revised 01/06/2006 - Adding connection pool exposure
//      revised 01/05/2006 - kpm use VIP IP addresses.
//      revised 12/21/2005 - PC
// ...........
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.*;
import java.sql.Statement;
import java.util.Properties;
import java.util.*;
import oracle.jdbc.pool.OracleDataSource;
import oracle.jdbc.pool.*;
import oracle.jdbc.*;
public class FCFDemo2 {
   public static void main(String[] args) throws InterruptedException {
   try {
   OracleDataSource ods = new OracleDataSource();
   OracleConnectionCacheManager occm = null;
   ods.setUser("system");
   ods.setPassword("oracle");
   String cacheName = "cache";
   String dbURL="jdbc:oracle:thin:@"     
               +"(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=ON)" +
         "(ADDRESS=(PROTOCOL=TCP)" +
         "(HOST=rac1-vip)(PORT=1521))" +
         "(ADDRESS=(PROTOCOL=TCP)" +
         "(HOST=rac2-vip)(PORT=1521)))" +
         "(CONNECT_DATA=(SERVICE_NAME=serviceone)))";
   System.out.println("Url=" + dbURL );
   ods.setURL(dbURL);
   System.out.println("Url=" + dbURL );
   ods.setConnectionCachingEnabled(true);
   Properties prop = new Properties();
   prop.setProperty("MinLimit", "5");
   prop.setProperty("MaxLimit", "40");
   prop.setProperty("InitialLimit", "10");
   prop.put (oracle.net.ns.SQLnetDef.TCP_CONNTIMEOUT_STR,"" + (5000)); // 5000ms
   ods.setConnectionCacheProperties(prop);
   ods.setFastConnectionFailoverEnabled(true);
   ods.setONSConfiguration("nodes=rac1-vip:6200,rac2-vip:6200");
   ods.setConnectionCacheName(cacheName);   // moved from above
   occm = OracleConnectionCacheManager.getConnectionCacheManagerInstance();
   Connection conn = ods.getConnection();
   conn.setAutoCommit(false);
   System.out.println(conn);
   Statement stmt=conn.createStatement();  // fixed
   ResultSet rs = null;
   int stop = 0;
   
   rs =stmt.executeQuery("truncate table marcin");  
   int i = 0;
   int j=0;
   String name = new String();
      while (true){
      try {
                //System.out.println("Url=" + dbURL );
                rs =stmt.executeQuery("select instance_name from v$instance");
                while(rs.next()) {
 name = rs.getString(1);
                   System.out.println("Instance name: " + name);
                   System.out.println("FCF Activ(" + cacheName + "): " +
                   occm.getNumberOfActiveConnections(cacheName));
                   System.out.println("FCF Avail(" + cacheName + "): " +
                   occm.getNumberOfAvailableConnections(cacheName)+"\n");
                   PreparedStatement pstmt = conn.prepareStatement ("insert into marcin  values (?, ?)");
 pstmt.setInt (1, i);               
                   pstmt.setString (2, name);   
pstmt.execute ();
 System.out.println("inserted " + i +" \n");
 Thread.sleep(1000);
 conn.commit();    
 System.out.println("Commited " + i + " \n");
 i++;
      }
      catch (OracleSQLException sqle) {
     //sqle.printStackTrace();
              System.out.println("getErrorCode=" + sqle.getErrorCode());
              System.out.println("getSQLState=" + sqle.getSQLState());
              System.out.println("getMessage=" + sqle.getMessage());
     //Thread.sleep(1000);
     
     System.out.println("Try number " + j + "\n");
     j++;
     
     try {
     conn =ods.getConnection(); //Re-get the conn
     stmt =conn.createStatement();
     } catch (OracleSQLException sqle1) {
               System.out.println("getErrorCode - reconnect =" + sqle.getErrorCode());
               System.out.println("getSQLState - reconnect =" + sqle.getSQLState());
               System.out.println("getMessage - reconnect=" + sqle.getMessage());
      sqle1.printStackTrace();      
     }
        }
       Thread.sleep(1000);
    System.out.println("New iteration \n");    
       }
   }
   catch(Exception e)
      {
      e.printStackTrace();
      }
   }
}


Start inserting rows.

Instance name: testone_1
FCF Activ(cache): 1
FCF Avail(cache): 9

inserted 185
Commited 185

Instance name: testone_1
FCF Activ(cache): 1
FCF Avail(cache): 9

inserted 186
Commited 186

java.sql.SQLException: Closed Connection
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:111)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:145)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:207)
        at oracle.jdbc.driver.OracleStatement.ensureOpen(OracleStatement.java:3512)
        at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1244)
        at FCFDemo2.main(FCFDemo2.java:70)
getErrorCode=17008
getSQLState=null
getMessage=Closed Connection
Try number 0

Instance name: testone_2
FCF Activ(cache): 1
FCF Avail(cache): 0

inserted 187
Commited 187

Instance name: testone_2
FCF Activ(cache): 1
FCF Avail(cache): 0

inserted 188

Commited 188

Instance name: testone_2
FCF Activ(cache): 1
FCF Avail(cache): 0

As you can see transaction got a exception but because there is a loop in the code transaction has been replay and successfully inserted in database.

SQL> select id,cos from system.marcin where id > 184 order by id;

        ID   INSTANCE_NAME
---------- ------------------------------
       185 testone_1
       186 testone_1
       187 testone_2
       188 testone_2

Any RAC aware software should work with RAC OneNode without any problem like with typical Oracle RAC installation.

This is last part about Oracle RAC OneNode and connection details but I’m sure not a last about RAC itself.

Tuesday, December 15, 2009

How to create CSV file in SQL*Plus

As I remember this is one of question which was asked every time on my Oracle DBA classes
"How to save a data in CSV or other text format ?"
Now a answer is simple go to
Shivmohan Purohit Oracle blog and check that post SQL*Plus output.


Monday, December 14, 2009

Oracle RAC One Node – part 2

In this post I want to share with you some scenarios about RAC One Node and client behaviour. In that part I’m using SQL*Plus from 10g so there is no FAN notification, next part will be with FAN enabled client. The main difference between using and not using FAN is that FAN client has a feedback from Oracle Cluster (formally Grid Infrastructure in 11R2 or Oracle ClusterWare in previous releases) and has information about current cluster status. Non FAN client has to wait for TCP/IP time-out for current session before switching to other node. More information about client failover and cluster configuration can be found here.

My server and client configuration has been presented in part 1 (entry from TNSNAMES.ORA) so this is only short recall:
  • Two nodes – RAC1 and RAC2 with SCAN configured
  • SQL*Plus 10g Client


I want to present 3 scenarios:


  1. One node in cluster is running

  2. Two nodes are running – current instance is crashing



  3. Fixing configuration





One node in cluster is running

Due to failure or maintenance only one node in cluster is running. RAC One node service is starting on that node and it is opened for all sessions. Irrespective of which node was current previously instance is open on node which is running now. Current status can be checked via raconestatus

[oracle@rac1 ~]$ raconestatus

RAC One Node databases on this cluster:


Database UP Fix Required Current Server Candidate Server Names
======== == ============ ============================== ========================================
testone Y N rac1 rac1 rac2


Available Free Servers:

As we can see there is no available server to move our instance on. If we start Omotion process we will hit error.


[oracle@rac1 ~]$ Omotion

RAC One Node databases on this cluster:

# Database Server Fix Required
=== ======== ============================== ============
[1] testone rac1 N

Enter number of the database to migrate [1]:

Specify maximum time in minutes for migration to complete (max 30) [30]: 2

Available Target Server(s) :
# Server Available
=== ================== =========
ERROR: Cannot find any available servers in this cluster.

Exiting...

[oracle@rac1 ~]$

In case of shutdown node RAC1 all session will be terminated as in single node configuration. Whatever server will be started after both nodes are down will become a current one.

Two nodes are running – current instance is crashing
Both servers are up – instance is running on RAC1 server. Current cluster configuration can be checked via crsctl command



[oracle@rac1 grid]$ $GRID_HOME/bin/crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.eons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.gsd
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.net1.network
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.ons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac1
ora.oc4j
1 OFFLINE OFFLINE
ora.rac1.vip
1 ONLINE ONLINE rac1
ora.rac2.vip
1 ONLINE ONLINE rac2
ora.scan1.vip
1 ONLINE ONLINE rac1
ora.testone.db
1 ONLINE ONLINE rac1 Open
ora.testone.serviceone.svc
1 ONLINE ONLINE rac1
[oracle@rac2 grid]$


Session is connected to RAC1 server


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> select * from test;

ID
----------
1
2
3
4
5
6

6 rows selected.

SQL> insert into test values (7);

1 row created.

SQL> commit;

Commit complete.

SQL> insert into test values (8);

1 row created.

SQL> select instance_name, host_name from v$instance;

INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_2 rac1

SQL>

Active Query
Now first scenario when I’m repeating a select command in SQL*Plus and in that same time I’m turning off RAC1 server. In that case a active statement is cancelled and client is trying to reconnect at that time.


SQL> select to_char(sysdate,'hh24:mi') from dual;

TO_CH
-----
09:55

SQL> select to_char(sysdate,'hh24:mi') from dual;
select to_char(sysdate,'hh24:mi') from dual
*
ERROR at line 1:
ORA-03135: connection lost contact


SQL> select to_char(sysdate,'hh24:mi') from dual;
ERROR:
ORA-03114: not connected to ORACLE

As you can see command has been cancelled and session has been terminated. There is no other instance up and running yet to there is no target to failover a session. In client session network trace file you can find following order of errors:

  • ORA -12547 – TNS Lost Contact

  • ORA- 12151 – Bad packages

  • ORA-03115 – connection lost contact


After that client is going to reconnect session to other nodes like in normal RAC configuration. Client is calling SCAN address to get another VIP address but cluster is still in that same state – a reconfiguration has been just started – and Oracle Cluster is returning a rac1-vip once again as it is only one VIP configured for database service.

Keep in mind that current, not committed transaction in that case will be rollback.


Idle session
When a session is in an idle state a failover will take place when a first SQL query will be executed.
See example

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> select to_char(sysdate,'hh24:mi') from dual;

TO_CH
-----
15:30


SQL> select instance_name, host_name from v$instance;

INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_2 rac1


SQL> select to_char(sysdate,'hh24:mi') from dual;

TO_CH
-----
15:38

SQL>
SQL> select instance_name, host_name from v$instance;

INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_1 rac2

SQL>

As you can see a connection has been migrated between servers without any error. This is because a before a last query a new instance was up and running. If we take a look into client network trace file we will see following errors
  • ORA- 12151
  • ORA- 3113
After that client is going to reconnect session to other nodes. Client is calling SCAN address to get another VIP address and cluster in returning a new VIP (rac2-vip) as a database service has been successfully migrated to other node.
Keep in mind that current, not committed transaction has to be rollback after session migration.


Fixing configuration

After unexpected crash it is possible that RAC One Node configuration need to be fixed.
I have figure out two possible scenarios:

  • After node crash and migration of instance and service the crashed node has been rebooted and there are two running instances

    [oracle@rac1 grid]$ raconestatus

    RAC One Node databases on this cluster:


    Database UP Fix Required Current Server Candidate Server Names
    ======== == ============ ============================== ========================================
    testone Y Y rac1 rac2 rac1 rac2


    Available Free Servers:

    [oracle@rac1 grid]$

    Fixing of configuration will close one of instance (on rebooted node) and will clean a configuration.

    [oracle@rac1 rac1]$ raconefix

    RAC One Node databases on this cluster:

    # Database Server Fix Required
    === ======== ============================== ============
    [1] testone rac1 Y

    Enter number of the database to fix [1]:

    If database is up, it will be checked and cleaned after a previous fail over.


  • Very similar situation like in previous point but only one instance is working and there is a “mess” in RAC service configuration. Oracle RAC One Node fix will clean up a configuration.


At the end I want to mention that I have hit some strange behavior related to migration of instance between servers and starting and stopping servers. I’m not sure if it a bug or not but after whole cluster restart instances have been migrated across.
First configuration was

  • RAC1 – instance name : testone_2
  • RAC2 – instance name : testone_1
Yes I know it was my inconsequence in naming so Oracle decided to fix it. No, not at the beginning but after a week of testing and restarts. And now it looks like:

  • RAC1 – instance name : testone_1
  • RAC2 – instance name : testone_2

Unfortunately after that I was unable to play with RAC One Node and I have to clear a whole configuration and create it from scratch.

Next part will be about FAN enabled clients and more about transactions.

Monday, December 7, 2009

Oracle RAC One Node – part 1

This post is about a installation and configuration of Oracle RAC 11gR2 in One Node configuration on VMWare. At the beginning I want describe a RAC One Node a little bit more – this is a new possible configuration with special licence and price which is very similar to fail over cluster configuration. During a normal work only one instance and up and running – like in failover cluster, a difference is in migration process. The following steps are performed during a migration:
  1. Second instance in started on target node
  2. All session are migrated to target instance – TAF has to be enabled on client configuration
  3. Source instance in shutdown in transaction mode
  4. After timeout a source instance in shutdown in abort mode
Installation and configuration in based on Linux CentOS 5.3 and Oracle 11g 11.2.0.1

If you want to skip Grid Infrastructure configuration tips click here

Oracle Grid Infrastructure
Oracle Grid Infrastructure has to be installed and configured on all nodes belongs to Oracle RAC (this is requirement for both RAC configuration – typical RAC and One Node RAC). In previous releases Oracle Grid Infrastructure was called an Oracle ClusterWare. In 11g R2 a name has been change to Grid Infrastructure and a lot of changes have been made. The most important are:
  • Oracle ASM is a part of Grid Infrastructure and not a part of Oracle Database
  • Vote and cluster configuration can use ASM disk or cluster file system
  • Raw or block devices for vote and cluster configuration are not supported during a installation time
  • More RAM is required ?

Last change made me sad as I have only 4 GB RAM on my laptop so I can allocate about 3 GB for two VM. But why do not try. After a few tests I have found a working configuration for both nodes.

RAC1 – node number 1
  • 1.5 GB of RAM allocated for VM
  • 2.0 GB of swap
  • 1 CPU
  • 10 GB of free space for Oracle Homes (both infrastructure and database)

RAC2 – node number 2
  • 1.0 GB of RAM allocated for VM
  • 2.0 GB of swap
  • 1 CPU
  • 10 GB of free space for Oracle Homes (both infrastructure and database)

Yes there is a difference between memory sizes in nodes – start OUI on node number 1 with more allocated memory.

Common configuration:

/etc/hosts
10.10.20.129    rac1.localdomain rac1
192.168.226.10 rac1-priv.localdomain rac1-priv
10.10.20.130 rac2.localdomain rac2
192.168.226.20 rac2-priv.localdomain rac2-priv
10.10.20.200 rac-cluster
10.10.20.210 rac1-vip.localdomain rac1-vip
10.10.20.220 rac2-vip.localdomain rac2-vip
RAC1-VIP and RAC2-VIP has to be assigned to public network and not configured during an installation. There is an additional entry for rac-cluster which is SCAN interface and has to be in same network as both VIP interfaces and not configured during a configuration time.
SCAN interface is a single entry point for cluster – for more information you can see RAC documentation.

Now is time to start OUI and install a Grid Control infrastructure. Below some installation tips:
  • Installation option – choose Install and configure Grid Infrastructure for Cluster
  • Installation type – Choose Typical installation
  • SCAN name – type rac-cluster
  • Add both nodes – RAC1 and RAC2 with proper VIP – note there is no need to private name anymore
  • Test SSH connectivity between nodes and click Setup there are any problems
  • Choose 10.10.20.x subnet as public and 192.168 as private
  • Specify a Oracle Base and Software location (which is a Oracle Home for Grid Infrastructure) - note that Oracle Home for Grid has to be in different location that Oracle Base
  • Select a disk for ASM group – it will be used for vote and configuration file too – if required change a Discovery Path to correct value (ex. /dev/sdc* for block devices or ORCL:* if you are using ASMLib)
  • If there are any problem with kernel setting or missing packages solve it before installation start – you can ignore memory, swap and NTP – but you have to have at least memory size specified above.
After about 15 minutes there is a time for a last step – execute a root.sh to configure and start cluster infrastructure. Run root.sh on node with more memory first.


Oracle Database Oracle Home

Just perform a standard installation of Oracle 11gR2 binaries without database creation.

RAC One Node patch

This is a best time to install a patch which adds One Node support to our database Oracle Home. Why ? Because it has to be installed when DB is down, so before creation of database we don’t need any other actions. Patch number is RACONENODE_p9004119_112010_LINUX.zip and can be found on Oracle Support Pages.

Database creation

The most important thing is to create a database only on one node. On the first screen a RAC database has to be chosen and then only one node (ex. rac1) has to be selected.
The next important thing is storage for a database. In our example all database files will be placed in ASM disk group. This same used to keep Grid Infrastructure cluster configuration.
All other configuration settings don’t have any impact on RAC One Node configuration.

Service configuration

A new service has to be added to support a RAC One Node configuration. This service will be used in our client configuration and will be entry point to our database.

srvctl add service -d testone -s serviceone -r testone1
where
  • testone – is a database name
  • serviceone – is a service name
  • testone1 – is a instance name created in previous point

RAC One node configuration

When database and service are up there is a time to start a RAC One Node configuration.
To do that a raconeinit has to be started.


Candidate Databases on this cluster:
# Database RAC One Node Fix Required
=== ======== ============ ============
[1] testone NO N/A
Enter the database to initialize [1]:
Database testone is now running on server rac1
Candidate servers that may be used for this DB: rac2

Enter the names of additional candidate servers where this DB may run (space delimited): rac2

Please wait, this may take a few minutes to finish.
Database configuration modified.

After that command a new configuration should be in place. Current status can be checked with following command:raconestatus


RAC One Node databases on this cluster:

Database UP Fix Required Current Server Candidate Server Names
======== == ============ ============================== ========================================
testone Y N rac1 rac1 rac2

Available Free Servers:

RAC One node operations

Main RAC One Node operation is moving an instance between nodes. That operation can be done using Omotion tool. Here is a example of Omotion execution

RAC One Node databases on this cluster:

# Database Server Fix Required
=== ======== ============================== ============
[1] testone rac1 N
Enter number of the database to migrate [1]:
Specify maximum time in minutes for migration to complete (max 30) [30]: 5
Available Target Server(s) :
# Server Available
=== ================== =========
[1] rac2 Y
Enter number of the target node [1]:

Omotion Started...
Starting target instance on rac2...
Migrating sessions...
Stopping source instance on rac1...
Omotion Completed...

=== Current Status ===
Database testone is running on node rac2

In that example database instance has been moved from node rac1 to node rac2. Instance on rac1 has been closed in transaction mode. In that scenario there was no remaining session on rac1 instance. When there are outstanding sessions/transactions on source node (in that case rac1) Omotion will shutdown that instance in transaction mode and then after time out will shutdown in abort mode – see example below.

RAC One Node databases on this cluster:

# Database Server Fix Required
=== ======== ============================== ============
[1] testone rac1 N

Enter number of the database to migrate [1]:
Specify maximum time in minutes for migration to complete (max 30) [30]: 5
Available Target Server(s) :
# Server Available
=== ================== =========
[1] rac2 Y

Enter number of the target node [1]:
Omotion Started...
Starting target instance on rac2...
Migrating sessions...
Stopping source instance on rac1...
Timeout exceeded, aborting instance...
Omotion Completed...

=== Current Status ===
Database testone is running on node rac2
Client configuration
How is look like from client perspective?
  1. TAF not configured – session has to be reconnected after instance migration
  2. TAF configured in client TNS – only current transaction has to be rollback.
TAF example:
tnsnames.ora

testone =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = rac-cluster)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = testone)
(FAILOVER_MODE=
(TYPE=select)
(METHOD=basic))
)
)

Take a look on address - there is no VIP any more - now SCAN name has to be entered
in TNS alias and resolved via DNA or hosts as well as all RAC VIP (rac1-vip and rac2-vip).
SQL*Plus test


sqlplus system@testone
SQL> select instance_name, host_name from v$instance;

INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_1 rac2


SQL> select * from test;

ID
----------
1
2
3
4
5
6
7

7 rows selected.

SQL> select instance_name, host_name from v$instance;

INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_1 rac2


Omotion has been started.

SQL> select instance_name, host_name from v$instance;
select instance_name, host_name from v$instance
*
ERROR at line 1:
ORA-25402: transaction must roll back


SQL> rollback;

Rollback complete.

SQL> select instance_name, host_name from v$instance;

INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_2 rac1


SQL> select * from test;

ID
----------
1
2
3
4
5
6

6 rows selected.

SQL>

After migration there was two changes - instance name and host name has been changed - it's not like in typical failover clustre
where instance is migrated from one host to other. In RAC One Node a new instance

is started on second node and during a migration time this configuration is working
as typical RAC.

This is end of part one - next part with more test and operations soon.


Thursday, December 3, 2009

Oracle RAC 11R2 and VMWare

Hi,

Last week I spend on installation and configuration Oracle RAC 11R2 on VMWare.
This nothing special but I want to do that on my laptop with Windows XP and 4 GB of RAM.
Until know it was enough RAM to install Oracle RAC in 9i, 10g and 11R1 but with 11R2 it is a more complicated.

First of all Oracle required 1.5 GB for Grid infrastructure or 2.5 GB for Grid Infrastructure
and database. If you multiple it by 2 - it is 8 GB as a minimum ;) Hmmm we have only half of it.

First attempt was to install it with my original settings - 1 GB RAM and 1 GB swap for each of VM.
It hung during root.sh script at the end of installation.

After that I decided to increase memory one of VM and add swap for both. New configuration was

on node 1 - 1.5 GB RAM and 2 GB swap,
on node 2 - 1.0 GB RAM and 2 GB swap.


That was it. Node with bigger RAM was used to start OUI and was first in cluster.

After successful installation of Grid Infrastructure I decided to decrease SGA size for ASM instances to required minimum (136 MB) and I have started a database installation and configuration. I have installed Oracle binaries and then I have started manually DBCA to create a database - of course SGA has been to required minimum (about 370 MB).
After 25 minutes I have had a working environment.

Next I have installed a patch for RAC One node - a new option for run Oracle in fail over cluster.
A new post about that should appear in a few days.

regards,
Marcin

Wednesday, November 4, 2009

Have to read and remember

Hi,

Third post today (all very short) but I have to clean up my workspace and close all tab in Firefox.
As I always has problem with bookmarks there is a list of interesting documents which I have read and kept open from some time to add it to blog.

Jonathan Lewis - old but very important about indexes.
James Morle's blog - new stuff is coming
Oracle CBO development group blog - some new papers ex. Upgrade from 10g to 11g - CBO changes
Uwe Hesses blog - is count(col) better than count(*)
Miladin Modrakovic's blog - New Oradebug feature in 11R2

Update:
I have forgot about Richard Foote's blog - two entries - Open World Highlights
and another entries about index rebuilding - I'm on Richard side and I prefer to go out on Sunday afternoon ;)