My server and client configuration has been presented in part 1 (entry from TNSNAMES.ORA) so this is only short recall:
- Two nodes – RAC1 and RAC2 with SCAN configured
- SQL*Plus 10g Client
I want to present 3 scenarios:
- One node in cluster is running
- Two nodes are running – current instance is crashing
- Fixing configuration
One node in cluster is running
Due to failure or maintenance only one node in cluster is running. RAC One node service is starting on that node and it is opened for all sessions. Irrespective of which node was current previously instance is open on node which is running now. Current status can be checked via raconestatus
[oracle@rac1 ~]$ raconestatus
RAC One Node databases on this cluster:
Database UP Fix Required Current Server Candidate Server Names
======== == ============ ============================== ========================================
testone Y N rac1 rac1 rac2
Available Free Servers:
As we can see there is no available server to move our instance on. If we start Omotion process we will hit error.
[oracle@rac1 ~]$ Omotion
RAC One Node databases on this cluster:
# Database Server Fix Required
=== ======== ============================== ============
[1] testone rac1 N
Enter number of the database to migrate [1]:
Specify maximum time in minutes for migration to complete (max 30) [30]: 2
Available Target Server(s) :
# Server Available
=== ================== =========
ERROR: Cannot find any available servers in this cluster.
Exiting...
[oracle@rac1 ~]$
In case of shutdown node RAC1 all session will be terminated as in single node configuration. Whatever server will be started after both nodes are down will become a current one.
Two nodes are running – current instance is crashing
Both servers are up – instance is running on RAC1 server. Current cluster configuration can be checked via crsctl command
[oracle@rac1 grid]$ $GRID_HOME/bin/crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.eons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.gsd
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.net1.network
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.ons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac1
ora.oc4j
1 OFFLINE OFFLINE
ora.rac1.vip
1 ONLINE ONLINE rac1
ora.rac2.vip
1 ONLINE ONLINE rac2
ora.scan1.vip
1 ONLINE ONLINE rac1
ora.testone.db
1 ONLINE ONLINE rac1 Open
ora.testone.serviceone.svc
1 ONLINE ONLINE rac1
[oracle@rac2 grid]$
Session is connected to RAC1 server
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> select * from test;
ID
----------
1
2
3
4
5
6
6 rows selected.
SQL> insert into test values (7);
1 row created.
SQL> commit;
Commit complete.
SQL> insert into test values (8);
1 row created.
SQL> select instance_name, host_name from v$instance;
INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_2 rac1
SQL>
Active Query
Now first scenario when I’m repeating a select command in SQL*Plus and in that same time I’m turning off RAC1 server. In that case a active statement is cancelled and client is trying to reconnect at that time.
SQL> select to_char(sysdate,'hh24:mi') from dual;
TO_CH
-----
09:55
SQL> select to_char(sysdate,'hh24:mi') from dual;
select to_char(sysdate,'hh24:mi') from dual
*
ERROR at line 1:
ORA-03135: connection lost contact
SQL> select to_char(sysdate,'hh24:mi') from dual;
ERROR:
ORA-03114: not connected to ORACLE
As you can see command has been cancelled and session has been terminated. There is no other instance up and running yet to there is no target to failover a session. In client session network trace file you can find following order of errors:
- ORA -12547 – TNS Lost Contact
- ORA- 12151 – Bad packages
- ORA-03115 – connection lost contact
After that client is going to reconnect session to other nodes like in normal RAC configuration. Client is calling SCAN address to get another VIP address but cluster is still in that same state – a reconfiguration has been just started – and Oracle Cluster is returning a rac1-vip once again as it is only one VIP configured for database service.
Keep in mind that current, not committed transaction in that case will be rollback.
Idle session
When a session is in an idle state a failover will take place when a first SQL query will be executed. See example
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> select to_char(sysdate,'hh24:mi') from dual;
TO_CH
-----
15:30
SQL> select instance_name, host_name from v$instance;
INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_2 rac1
SQL> select to_char(sysdate,'hh24:mi') from dual;
TO_CH
-----
15:38
SQL>
SQL> select instance_name, host_name from v$instance;
INSTANCE_NAME HOST_NAME
---------------- ----------------------------------------------------------------
testone_1 rac2
SQL>
As you can see a connection has been migrated between servers without any error. This is because a before a last query a new instance was up and running. If we take a look into client network trace file we will see following errors
- ORA- 12151
- ORA- 3113
Keep in mind that current, not committed transaction has to be rollback after session migration.
Fixing configuration
After unexpected crash it is possible that RAC One Node configuration need to be fixed. I have figure out two possible scenarios:
- After node crash and migration of instance and service the crashed node has been rebooted and there are two running instances
[oracle@rac1 grid]$ raconestatus
RAC One Node databases on this cluster:
Database UP Fix Required Current Server Candidate Server Names
======== == ============ ============================== ========================================
testone Y Y rac1 rac2 rac1 rac2
Available Free Servers:
[oracle@rac1 grid]$
Fixing of configuration will close one of instance (on rebooted node) and will clean a configuration.
[oracle@rac1 rac1]$ raconefix
RAC One Node databases on this cluster:
# Database Server Fix Required
=== ======== ============================== ============
[1] testone rac1 Y
Enter number of the database to fix [1]:
If database is up, it will be checked and cleaned after a previous fail over. - Very similar situation like in previous point but only one instance is working and there is a “mess” in RAC service configuration. Oracle RAC One Node fix will clean up a configuration.
At the end I want to mention that I have hit some strange behavior related to migration of instance between servers and starting and stopping servers. I’m not sure if it a bug or not but after whole cluster restart instances have been migrated across.
First configuration was
- RAC1 – instance name : testone_2
- RAC2 – instance name : testone_1
- RAC1 – instance name : testone_1
- RAC2 – instance name : testone_2
Unfortunately after that I was unable to play with RAC One Node and I have to clear a whole configuration and create it from scratch.
Next part will be about FAN enabled clients and more about transactions.
1 comments:
This is an excellent article.
Still, if you want something more hands-on, try these:
http://vgrigorian.com/11gsimulator/1_rac11gr2.htm
http://vgrigorian.com/11gsimulator/2_rac11gr2rdbms1.htm
http://vgrigorian.com/11gsimulator/3_rac11gasm.htm
http://vgrigorian.com/11gsimulator/4_11gr2dbcreate.htm
You can find more demos (including dataguard, goldengate, streams) there at http://vgrigorian.com/
Thanks.
Vladimir Grigorian
Post a Comment