Friday, June 7, 2013

DataGuard FastStart Failover configuration with Oracle Wallet

Last two months were very busy for me with lot of work and moving house as well. There is a lot of thoughts in my head I want to write about but some of them have to wait until my lab will be online again.Recently I was working on Oracle DataGuard configuration with FastStart Failover and this is what I want to share with you today.

FastStart Failover is a good solution for automatic promoting standby database to be primary and allow application to keep running in case of primary database failure. I don't want to describe whole solution here but want to focus on Observer problems I had recently.

Observer is a part of DataGuard Broker running on 3rd server and it is used to prevent a brain split between primary and standby database. Observer is started by DGMGRL program and it required a sys user credentials. Customer doesn't want to keep sys password in scripts to start and stop observer so Oracle Wallet looks like a good solution to address this issue.

Configuration overview:
- database name - TESTDB
- unique database names - TESTDB_PRIMARY and TESTDB_STANDBY

Wallet has been created and sys user with alias for primary and standby database has been added to it.
I was able to connect to both servers:

[oracle@orasvr3 ~]$ mkstore -wrl /home/oracle/testwallet -listCredential
Oracle Secret Store Tool : Version 11.2.0.3.0 - Production
Copyright (c) 2004, 2011, Oracle and/or its affiliates. All rights reserved.
Enter wallet password:
List credential (index: connect_string username)
2: TESTDB_STANDBY sys
1: TESTDB_PRIMARY sys
[oracle@orasvr3 ~]$ dgmgrl
DGMGRL for Linux: Version 11.2.0.3.0 - 64bit Production
Copyright (c) 2000, 2009, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
DGMGRL> connect /@TESTDB_PRIMARY
Connected.
DGMGRL> connect /@TESTDB_STANDBY
Connected.
DGMGRL>

Observer has been started and I begun failover tests. Here is a first attempt:
12:09:25.21  Thursday, May 30, 2013
Initiating Fast-Start Failover to database "TESTDB_PRIMARY"...
Performing failover NOW, please wait...
Failover succeeded, new primary is "TESTDB_PRIMARY"
12:09:31.78  Thursday, May 30, 2013

12:10:47.91  Thursday, May 30, 2013
Initiating reinstatement for database "TESTDB_STANDBY"...
Reinstating database "TESTDB_STANDBY", please wait...
Operation requires shutdown of instance "TESTDB" on database "TESTDB_STANDBY"
Shutting down instance "TESTDB"...
ORA-01031: insufficient privileges

Warning: You are no longer connected to ORACLE.

Please complete the following steps and reissue the REINSTATE command:
        shut down instance "TESTDB" of database "TESTDB_STANDBY"
        start up and mount instance "TESTDB" of database "TESTDB_STANDBY"

12:11:06.36  Thursday, May 30, 2013

Looks like Observer was unable to shutdown TESTDB_STANDBY database. This is typical error if there is no DGMGRL services registered in listeners but I was sure network configuration is fine as it was working perfectly fine when observer has been started with user and password in connection string. So this problem has to be related to Oracle Wallet only. There is a not well know switch for DGMGRL program to enable debug mode and have some more information about issue and I decided to use it. First of all I have started with switchover tests as there need this same interaction with starting and stopping instance like failover and are faster to perform.

Here is a output with debug option enabled:
[oracle@orasvr3 ~]$ dgmgrl -debug
DGMGRL for Linux: Version 11.2.0.3.0 - 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect /@TESTDB_PRIMARY
[W000 05/30 12:25:02.04] Connecting to database using TESTDB_PRIMARY.
[W000 05/30 12:25:02.09] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 05/30 12:25:02.10] Broker version is '11.2.0.3.0'
Connected.

DGMGRL> switchover to TESTDB_PRIMARY;
Performing switchover NOW, please wait...
New primary database "TESTDB_PRIMARY" is opening...
Operation requires shutdown of instance "TESTDB" on database "TESTDB_STANDBY"
Shutting down instance "TESTDB"...
[W000 05/30 12:26:23.24] Connecting to database using (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522)))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGB)(INSTANCE_NAME=TESTDB)(SERVER=dedicated))).
ORA-01031: insufficient privileges

Warning: You are no longer connected to ORACLE.

Please complete the following steps to finish switchover:
        shut down instance "TESTDB" of database "TESTDB_STANDBY"
        start up instance "TESTDB" of database "TESTDB_STANDBY"

  It is trying to connect to TESTDB_STANDBY using following connection description

(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522)))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGB)(INSTANCE_NAME=TESTDB)(SERVER=dedicated)))

but this one doesn't exist in wallet. Service _DBG is created by DMON process and it is registered in default listener or listeners specified in local_listener parameter.
I have added above connection string to Oracle Wallet and completed all manual steps before next try. Here is a output for second switchover run:
[oracle@orasvr3 ~]$ dgmgrl -debug
DGMGRL for Linux: Version 11.2.0.3.0 - 64bit Production
Copyright (c) 2000, 2009, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
DGMGRL> connect /@TESTDB_PRIMARY
[W000 06/06 12:50:31.01] Connecting to database using TESTDB_PRIMARY.
[W000 06/06 12:50:31.05] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 06/06 12:50:31.06] Broker version is '11.2.0.3.0'
Connected.
DGMGRL> switchover to TESTDB_PRIMARY;
Performing switchover NOW, please wait...
New primary database "TESTDB_PRIMARY" is opening...
Operation requires shutdown of instance "TESTDB" on database "TESTDB_STANDBY"
Shutting down instance "TESTDB"...
[W000 06/06 12:50:45.95] Connecting to database using (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522)))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGB)(INSTANCE_NAME=TESTDB)(SERVER=dedicated))).
ORACLE instance shut down.
Operation requires startup of instance "TESTDB" on database "TESTDB_STANDBY"
Starting instance "TESTDB"...
[W000 06/06 12:50:47.09] Connecting to database using (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGMGRL)(INSTANCE_NAME=TESTDB)(SERVER=DEDICATED))).
ORA-01031: insufficient privileges
Warning: You are no longer connected to ORACLE.
Please complete the following steps to finish switchover:
        start up and mount instance "TESTDB" of database "TESTDB_STANDBY"
Still there is manual step to do but this time database was shut down automatically, and Observer was unable to connect to start up database. There is other connection string

(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGMGRL)(INSTANCE_NAME=TESTDB)(SERVER=DEDICATED))).

This one can be found here
DGMGRL> show database verbose TESTDB_STANDBY;

Database - TESTDB_STANDBY

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds
  Apply Lag:       0 seconds
  Real Time Query: OFF
  Instance(s):
    TESTDB

  Properties:
    DGConnectIdentifier             = 'TESTDB_STANDBY'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'SYNC'
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    MaxConnections                  = '1'
    ReopenSecs                      = '300'
    NetTimeout                      = '10'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyParallel                   = 'AUTO'
    StandbyFileManagement           = 'AUTO'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '4'
    LogArchiveMinSucceedDest        = '1'
    DbFileNameConvert               = ''
    LogFileNameConvert              = 'a, a'
    FastStartFailoverTarget         = 'TESTDB_PRIMARY'
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    SidName                         = 'TESTDB'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGMGRL)(INSTANCE_NAME=TESTDB)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = '/oralogs/arch/TESTDB'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = 'arch_%t_%s_%r.arc'
    TopWaitEvents                   = '(monitor)'

Database Status:
SUCCESS
It has been added to Oracle Wallet as well and now my wallet looks like this
[oracle@orasvr3 ~]$ mkstore -wrl /home/oracle/testwallet -listCredential
Oracle Secret Store Tool : Version 11.2.0.3.0 - Production
Copyright (c) 2004, 2011, Oracle and/or its affiliates. All rights reserved.
Enter wallet password:
List credential (index: connect_string username)
6: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGMGRL)(INSTANCE_NAME=TESTDB)(SERVER=DEDICATED))) sys
5: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr1)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=TESTDB_PRIMARY_DGMGRL)(INSTANCE_NAME=TESTDB)(SERVER=DEDICATED))) sys
4: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522)))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGB)(INSTANCE_NAME=TESTDB)(SERVER=dedicated))) sys
3: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr1)(PORT=1522)))(CONNECT_DATA=(SERVICE_NAME=TESTDB_PRIMARY_DGB)(INSTANCE_NAME=TESTDB)(SERVER=dedicated))) sys
2: TESTDB_STANDBY sys
1: TESTDB_PRIMARY sys
After manual completion of required steps I run switchover again and this time it completed without any errors
DGMGRL> switchover to TESTDB_PRIMARY;
Performing switchover NOW, please wait...
New primary database "TESTDB_PRIMARY" is opening...
Operation requires shutdown of instance "TESTDB" on database "TESTDB_STANDBY"
Shutting down instance "TESTDB"...
[W000 05/30 12:36:51.39] Connecting to database using (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522)))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGB)(INSTANCE_NAME=TESTDB)(SERVER=dedicated))).
ORACLE instance shut down.
Operation requires startup of instance "TESTDB" on database "TESTDB_STANDBY"
Starting instance "TESTDB"...
[W000 05/30 12:36:52.54] Connecting to database using (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGMGRL)(INSTANCE_NAME=TESTDB)(SERVER=DEDICATED))).
[W000 05/30 12:36:52.58] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
ORA-01034: ORACLE not available
Process ID: 0
Session ID: 363 Serial number: 5

ORACLE instance started.
[W000 05/30 12:36:54.81] Connecting to database using (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGMGRL)(INSTANCE_NAME=TESTDB)(SERVER=DEDICATED))).
[W000 05/30 12:36:54.84] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 05/30 12:36:54.84] Broker version is '11.2.0.3.0'
alter database  mount

Database mounted.
[W000 05/30 12:37:02.74] Connecting to database using (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGMGRL)(INSTANCE_NAME=TESTDB)(SERVER=DEDICATED))).
[W000 05/30 12:37:02.77] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 05/30 12:37:02.78] Broker version is '11.2.0.3.0'
Switchover succeeded, new primary is "TESTDB_PRIMARY"
DGMGRL>

Lesson learned:
  • FSFO Observer can work with Oracle Wallet
  • 3 entries are required per database, service name has to be set to : <db name unique>, <db name unique_DBG> and <db name unique_DGMGRL>
  • DGMGRL entry can be checked in database configuration in Broker
  • DBG entry - use -debug mode if you can't find a proper connection string 
Hope it will help you with DataGuard Observer configuration.

regards,
Marcin


Friday, March 29, 2013

dbms_dnfs and clone.pl

If you are using to use dbms_dnfs to clone your database as described in MOS note 1210656.1 or on Tim's Hall blog - Direct NFS (DNFS) Clonedb in Oracle Database 11g Release 2 (Patchset 11.2.0.2) and your database version is 11.2.0.3 there is one additional step which is not covered by clone.pl script and has to be done manually.

In 11.2.0.2 it was enough to run crtdb.sql and dbren.sql scripts generated by clone.pl but this is what happen when you will run dbren.sql on 11.2.0.3
SQL> !cat dbren.sql
declare
begin
dbms_dnfs.clonedb_renamefile('/u01/app/oracle/backup/data_D-TESTSE_I-4169104998_TS-SYSAUX_FNO-2_22o1pnm0' , '/u01/clone/ora_data_CLONE2.dbf');
dbms_dnfs.clonedb_renamefile('/u01/app/oracle/backup/data_D-TESTSE_I-4169104998_TS-SYSTEM_FNO-1_20o1pnjq' , '/u01/clone/ora_data_CLONE3.dbf');
dbms_dnfs.clonedb_renamefile('/u01/app/oracle/backup/data_D-TESTSE_I-4169104998_TS-UNDOTBS1_FNO-3_23o1pnmp' , '/u01/clone/ora_data_CLONE4.dbf');
dbms_dnfs.clonedb_renamefile('/u01/app/oracle/backup/data_D-TESTSE_I-4169104998_TS-USERS_FNO-4_21o1pnkt' , '/u01/clone/ora_data_CLONE5.dbf');
end;
/
show errors;
alter database open resetlogs;
drop tablespace TEMP;
create temporary tablespace TEMP;

SQL> begin
dbms_dnfs.clonedb_renamefile('/u01/app/oracle/backup/data_D-TESTSE_I-4169104998_TS-SYSAUX_FNO-2_22o1pnm0' , '/u01/clone/ora_data_CLONE2.dbf');
dbms_dnfs.clonedb_renamefile('/u01/app/oracle/backup/data_D-TESTSE_I-4169104998_TS-SYSTEM_FNO-1_20o1pnjq' , '/u01/clone/ora_data_CLONE3.dbf');
dbms_dnfs.clonedb_renamefile('/u01/app/oracle/backup/data_D-TESTSE_I-4169104998_TS-UNDOTBS1_FNO-3_23o1pnmp' , '/u01/clone/ora_data_CLONE4.dbf');
dbms_dnfs.clonedb_renamefile('/u01/app/oracle/backup/data_D-TESTSE_I-4169104998_TS-USERS_FNO-4_21o1pnkt' , '/u01/clone/ora_data_CLONE5.dbf');
end;
  7  /
begin
*
ERROR at line 1:
ORA-01511: error in renaming log/data files
ORA-01141: error renaming data file 2 - new file '/u01/clone/ora_data_CLONE2.dbf' not found
ORA-01110: data file 2: '/u01/app/oracle/backup/data_D-TESTSE_I-4169104998_TS-SYSAUX_FNO-2_22o1pnm0'
ORA-17515: Creation of clonedb failed using snapshot file /u01/app/oracle/backup/data_D-TESTSE_I-4169104998_TS-SYSAUX_FNO-2_22o1pnm0
ORA-06512: at "SYS.X$DBMS_DNFS", line 10
ORA-06512: at line 2
After that operation I found following errors in alert log
Fri Mar 29 13:31:20 2013
ERROR: clonedb parameter not set. Make sure clonedb=TRUE is set
Fri Mar 29 13:31:20 2013
Signalling error 1152 for datafile 1!
Signalling error 1152 for datafile 2!
Signalling error 1152 for datafile 4!
Checker run found 4 new persistent data failures

Let's check parameter 
SQL> show parameter clone

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
clonedb                              boolean     FALSE


I have changed that parameter in init.ora and restarted instance.
SQL> show parameter clone

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
clonedb                              boolean     TRUE

SQL> begin
dbms_dnfs.clonedb_renamefile('/u01/app/oracle/backup/data_D-TESTSE_I-4169104998_TS-SYSAUX_FNO-2_22o1pnm0' , '/u01/clone/ora_data_CLONE2.dbf');
dbms_dnfs.clonedb_renamefile('/u01/app/oracle/backup/data_D-TESTSE_I-4169104998_TS-SYSTEM_FNO-1_20o1pnjq' , '/u01/clone/ora_data_CLONE3.dbf');
dbms_dnfs.clonedb_renamefile('/u01/app/oracle/backup/data_D-TESTSE_I-4169104998_TS-UNDOTBS1_FNO-3_23o1pnmp' , '/u01/clone/ora_data_CLONE4.dbf');
dbms_dnfs.clonedb_renamefile('/u01/app/oracle/backup/data_D-TESTSE_I-4169104998_TS-USERS_FNO-4_21o1pnkt' , '/u01/clone/ora_data_CLONE5.dbf');
end;
  7  /

PL/SQL procedure successfully completed.
Now it is working again so time to come back to other tests with cloning.

regards,
Marcin

Wednesday, March 20, 2013

Next OakTable 2012 video is online

Hi,

This time Alex Gorbachev presentation about ASM failures - next videos on the way.


regards,
Marcin

Sunday, March 3, 2013

New OakTable World 2012 video is published

Just published Kevin Closson video from OakTable World 2012.

Enjoy,

Marcin

Thursday, February 21, 2013

Oracle on AIX - where's my cpu time ?

This story begins couple of weeks ago when I was asked to look on performance problem for critical processes before go-live. There were differences between run times between different test systems and between runs with 10046 trace enabled or not.

First of all I started with tracing issue. I have checked server CPU and disk utilization using NMON (tool name is important here). There wasn’t any problem with CPU utilization – or at least this is what I was thinking at that time. Tracing was enabled with level 12 (bind and wait) and it generate 2 GB of trace file for 30 min work. First I decided to limit trace level to 8 and give it another try. In the meantime, I double-checked with Oracle community on Twitter (thanks @CaryMillsap and @MartinBerx) that there is not any know Oracle bugs related to trace and performance. Changing tracing level from 12 to 8 solve issue – I mean there no difference between process running with trace and without trace and also limit size of the file to 130MB from 2 GB (don’t ask ?)

In next step I focused on analyze differences between run times on two systems. It is quite easy to do with trace files and I was hoping for short solution.

My first surprise was that profile of SQL in both trace file were almost identical – there was not OS calls (waits) and query spend whole time on CPU. Execution plan were same and amount of data was same as well. Elapsed time for both executions was different and was equal to wall clock time recorded in process logs. To clarify my finding I will focus now on one UPDATE statement. 

When I compared elapsed time with CPU time for EXEC db call I have seen a huge gap between both values – on faster system ratio between CPU time / elapsed time was around 60 % and slower one around 37 % - remember there is no OS calls (wait events)
According to all my knowledge if there is no OS calls CPU time should be equal to elapsed time with probably some difference related to measurement or rounding. If there are differences, there is a huge chance that Oracle process was per-empted from CPU and was waiting in CPU queue. Well according to NMON stats system was utilized in 30 % only so why Oracle process was to wait in queue if there are lot of CPU’s free ? So maybe NMON is reporting system utilization it in wrong way and server itself is overloaded ? That question had to be answered.
I was lucky enough and I had a chance to run Oracle processes on idle server when no one else was connected. I got small LPAR with one core and 4 logical CPU’s as AIX SMT was enabled. I created a small test 

declare
 a number;
 begin
 for i in 1..100000 loop
 select count(*) into a from dba_users;
 end loop;
 end;
/
and I was surprised with results. I doesn’t trust nmon output anymore so I run topas  observing CPU utilization – it showed around 63 % of utilization for my session ! why not 95 – 100 % ? there was nothing else running there ?
I was even more surprised when I checked trace file – here is output formatted by tkprof for query I was looking at

SQL ID: 821zd9kr9k4yr Plan Hash: 3759210642

SELECT COUNT(*) 
FROM
 DBA_USERS


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute 100000      1.66       2.73          0          0          0           0
Fetch   100000     53.75      86.03          0    4200000          0      100000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   200001     55.41      88.77          0    4200000          0      100000
55.51 / 88.71 * 100 % = 62.5 %

Hey ! Dude where is my CPU time ? Did my process spend 33 s in CPU run queue ? on idle box ?

I needed more data and at that stage I had three hypothesis – Oracle bug, configuration problem or SMT option. I had funny feeling that SMT option is related to these results somehow but I checked MOS looking for bugs and asked Twitter friends to run same test on other system. (@MartinDBA  confirmed my findings - Thanks for your help).

Next step was to disable SMT option and I learn that AIX is so nice that you can do it online.  I rerun my test – Bingo – topas is showing 95 – 98 % of CPU utilization for my Oracle process and trace file is confirming it

SQL ID: 821zd9kr9k4yr Plan Hash: 3759210642

SELECT COUNT(*) 
FROM
 DBA_USERS


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute 100000      2.69       2.78          0          0          0           0
Fetch   100000     85.08      88.33          0    4200000          0      100000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   200001     87.78      91.12          0    4200000          0      100000
87.78 / 91.12 * 100 % = 96.3 %

This is what I was looking for. Based on both observations I was sure that there is no bug’s in tracing functionality for this version of Oracle (11.2.0.3) and I was sure that Oracle process can spin CPU to almost 100 %. There was still question – why there is a huge gap in cpu to elapsed time with SMT enabled. I start looking for IBM documentation and bugs and after some time I found that page - Understanding Processor Utilization on Power Systems - AIX and this picture (linked from IBM wiki page)



So for new PowerPC 7 if your process is running 1 sec on logical CPU (thread)  CPU utilization (user plus system) is 0.63 sec only. If you compare this one with no SMT enabled (or previous version of CPU) you can learn that if process was running for 1 sec on logical CPU it used to be reported as 1 sec of time used by CPU in user and system mode.

How tools are calculating CPU utilization for one process for  1 sec ? you need to divide CPU time used by this processes by 1 sec. So for pre-powerpc 7 or for power pc 7 without SMT

1 s on cpu / 1 s real time * 100 % = 100 %

For powerpc7 with 4 SMT

0.63 on cpu / 1 s * 100 % = 63 %

Have I seen this number before ? Yes this is CPU time taken from trace file divided by elapsed time ! My mystery has been solved – 63 % is now 100 % at least for that version of AIX and Oracle.


Where is a problem ? 

If you are using Cary Millsap Method-R methodology to diagnose performance problems you need to remember that unaccounted time for trace files will be impacted by this feature/bug/you name it.

ps.
I think this post will be updated when I will have more details.

regards,
Marcin

Saturday, February 9, 2013

OakTable World 2012 - Greg Rahn video

New video from OakTable World 2012 is available - Greg Rahn - Beyond Relational Databases

 

regards
Marcin

Friday, February 1, 2013

New OakTable Video - Delphix presenation

New OakTable World 2012 video from Kyle Hailey presentation has been uploaded into YouTube.


regards,
Marcin