Graphic representation of Active Session History data in Database Console or in Grid Control is causing some misunderstanding of how to read it. I have lot of questions about Y scale on OEM performance tab and now I decide to blog about it. I want to describe now OEM is using a ASH date and what we can do with this information. Anyway using SQL instead of OEM is more useful in performance tuning and can be easily focused on existing issue.
For other point of view knowledge of ASH give you opportunity to use free implementation of it developed by Kyle Hailey. (Thanks Kyle for your excellent scripts – this is great staff to learn from). By the way recently I have deployed SASH on big production database running on Oracle 9i and it help me to identify and address performance problems. As soon as I finish all these work I will blog about it too.
CPU utilization
CPU usage on performance tab is in green and in my case it has upper limit set to 2 – as I have two core CPU.
It is quite clear how to read CPU usage and what is a maximum CPU utilization. Scope of time which sessions can spend on CPU has an upper limit which is 100 % utilization of all CPU. It mean that maximum CPU time for session in sampling time can be calculated using following formula
Number of CPU x Sampling Time
So if we want to calculate a CPU usage we need to use following formula:
time consumed per all sessions / (Number of CPU x Sampling Time) * 100 %
ex.
time consumed per all sessions = 16 s
Number of CPU = 2
Sampling time = 15 s
Util of all CPUs = 15 / (2 * 15) * 100 % = 53.3 %
How to calculate CPU utilisation using Active Session History data instead of OEM performance tab ?
Active Session History table is a results of sampling V$SESSION view every second. That mean there is no “real” waiting or executing time in V$ACTIVE_SESSION_HISTORY but just samples. We are assuming that every state in ASH view (WAITING or ON CPU) took exactly 1 s. This is important as we don’t sum waiting time from ASH view but we have to count how many 1 s events were in observed time.
Be aware that DBA_HIST_ACTIVE_SESS_HISTORY has only 1/10 data of V$ACTIVE_SESSIN_HISTORY (only sample_id modulo 10 are copied into it) and each sample is now 10s long if we want to use it for our calculations.
Following query will calculate average CPU utilization for two minutes:
Number of CPU x Sampling Time
So if we want to calculate a CPU usage we need to use following formula:
time consumed per all sessions / (Number of CPU x Sampling Time) * 100 %
ex.
time consumed per all sessions = 16 s
Number of CPU = 2
Sampling time = 15 s
Util of all CPUs = 15 / (2 * 15) * 100 % = 53.3 %
How to calculate CPU utilisation using Active Session History data instead of OEM performance tab ?
Active Session History table is a results of sampling V$SESSION view every second. That mean there is no “real” waiting or executing time in V$ACTIVE_SESSION_HISTORY but just samples. We are assuming that every state in ASH view (WAITING or ON CPU) took exactly 1 s. This is important as we don’t sum waiting time from ASH view but we have to count how many 1 s events were in observed time.
Be aware that DBA_HIST_ACTIVE_SESS_HISTORY has only 1/10 data of V$ACTIVE_SESSIN_HISTORY (only sample_id modulo 10 are copied into it) and each sample is now 10s long if we want to use it for our calculations.
Following query will calculate average CPU utilization for two minutes:
SQL> select trunc(count(*)/ 2 (min(p.value)*(max(sample_id) - min (sample_id)))*100,2) "CPU_UTIL", 3 max(sample_id) - min (sample_id) "sampling time [s]", 4 count(*) "time consumed per sessions [s]", 5 min(p.value) * (max(sample_id) - min (sample_id)) "maximum possible time [s]" 6 from v$active_session_history, 7 (select value from v$parameter where name = 'cpu_count') p 8 where 1=1 9 and sample_time >= to_date('18-07-2010 12:23:00','dd-mm-yyyy hh24:mi:ss') 10 and sample_time <= to_date('18-07-2010 12:25:00','dd-mm-yyyy hh24:mi:ss') 11 and session_state = 'ON CPU'; CPU_UTIL sampling time [s] time consumed per sessions [s] maximum possible time [s] ---------- ----------------- ------------------------------ ------------------------- 29.23 118 69 236
But this doesn’t help us in understanding OEM graph. Refresh rate is set to 15 sec so we need to display data from that period split into 15 sec slices.
SQL> select trunc(to_char(sample_time,'SSSSS')/15) "15s_sample", trunc(count(*)/ 2 (min(p.value)*15)*100,2) "CPU_UTIL", 3 count(*) "time consumed per sessions [s]" 4 from v$active_session_history, 5 (select value from v$parameter where name = 'cpu_count') p 6 where 1=1 7 and sample_time >= to_date('18-07-2010 12:23:00','dd-mm-yyyy hh24:mi:ss') 8 and sample_time <= to_date('18-07-2010 12:25:00','dd-mm-yyyy hh24:mi:ss') 9 and session_state = 'ON CPU' 10 group by trunc(to_char(sample_time,'SSSSS')/15) 11 order by 1; 15s_sample CPU_UTIL time consumed per sessions [s] ---------- ---------- ------------------------------ 2972 6.66 2 2973 6.66 2 2974 53.33 16 2975 23.33 7 2976 36.66 11 2977 26.66 8 2978 33.33 10 2979 43.33 13 8 rows selected.
Now if you use any graphic tool to create a chart using 15s_sample column as X axis and "CPU util" column as Y axis you should get very similar picture to OEM chart.
Wait events
There is no easily calculated upper limit for waiting sessions. Why ? Number of waits per sample is depended on session state and number of waiting sessions. So what is a maximum for that ? Should we use only active sessions or all connected sessions ? The worst case scenario is all sessions are active and all are in waiting state. Both values multiplied each other give us a potential upper limit.
Performance tab in OEM is calculating a average of waits in sample time divided by groups and all groups are displayed using stacked area graph.
Following SQL return data used to create following graphs
Wait events
There is no easily calculated upper limit for waiting sessions. Why ? Number of waits per sample is depended on session state and number of waiting sessions. So what is a maximum for that ? Should we use only active sessions or all connected sessions ? The worst case scenario is all sessions are active and all are in waiting state. Both values multiplied each other give us a potential upper limit.
Performance tab in OEM is calculating a average of waits in sample time divided by groups and all groups are displayed using stacked area graph.
Following SQL return data used to create following graphs
SQL> select trunc(to_char(sample_time,'SSSSS')/15) "15s_sample", wait_class, count(*)/15 2 from v$active_session_history 3 where sample_time >= to_date('18-07-2010 12:23:00','dd-mm-yyyy hh24:mi:ss') 4 and sample_time <= to_date('18-07-2010 12:25:00','dd-mm-yyyy hh24:mi:ss') 5 and session_state = 'WAITING' 6 group by trunc(to_char(sample_time,'SSSSS')/15), wait_class order by 1,3 7 ; 15s_sample WAIT_CLASS COUNT(*)/15 ---------- ---------------------------------------------------------------- ----------- 2974 Other .2 2974 System I/O .533333333 2974 Commit 2 2974 User I/O 2.06666667 2974 Concurrency 4.66666667 2975 Application .066666667 2975 System I/O .933333333 2975 User I/O 3.8 2975 Commit 4.93333333 2976 Concurrency .066666667 2976 Application .066666667 2976 Other .4 2976 System I/O 1.2 2976 Commit 5.4 2976 User I/O 5.8 2977 System I/O .866666667 2977 User I/O 2.93333333 2977 Commit 5.6 2978 Concurrency .066666667 2978 System I/O .933333333 2978 User I/O 3.26666667 2978 Commit 4 2979 System I/O .866666667 2979 User I/O 1.46666667 2979 Commit 2.53333333 25 rows selected.
As results we have average time spend by all sessions divided per wait class for a 1 second. What it mean that in sample 2976 wait class “Commit” average time was 5.4 ? It exactly mean that in that 15 seconds sampling time all active sessions were 5.4*15 = 81 times waiting for events in commit class. Is it bad or not ? This is a different question and will not be answered in that post.
Now if we take a look on sample nr 2976
Now if we take a look on sample nr 2976
2976 Application .066666667 2976 Other .4 2976 System I/O 1.2 2976 Commit 5.4 2976 User I/O 5.8
Sum of all waits give us 12.8 and this is more less what we can see as a top value on graph. A differences between graph and our calculation are related to observation period start time and how we split results into 15 sampling time. We are using a group by function based on sample_time truncated into 4 periods of seconds - using group by to_char function (based on Kyle idea).These periods are: 0 – 15, 15- 30 , 30-45, 45-60 but OEM can display data based on different 15 sec groups ex. 10 - 25, 25 - 40 so small differences can occure.
Now the question is – is this a bad thing to have high numbers of OEM graph ? And like always when we are talking about Oracle performance an answer is depends.
What is bad for sure if our CPU usage is going over 85-90 % and stay there for a long time. In that case CPU response time will have good opportunity to increase and whole system performance can go down. What to do with waits ? In my opinion a best way is to create a baseline (even on paper) with average waiting time and react only when system is working over our baseline or what is more important when system users are claiming a performance problems.
Next time I will use drill-into functionality of OEM and will try to explain what other graphs means – unfortunately most of them are changing a scale between main performance graph and detailed ones so if you are not aware of that it can cause a lot of misunderstandings.
Regards,
Marcin
Next time I will use drill-into functionality of OEM and will try to explain what other graphs means – unfortunately most of them are changing a scale between main performance graph and detailed ones so if you are not aware of that it can cause a lot of misunderstandings.
Regards,
Marcin
3 comments:
Shouldn't we multiple cpu_count by parallel_threads_per_cpu? Like OEM does when drawing "Maximum CPU line"
How about "Total Disk I/O Per Second" ?
Please.
Sorry for delay.
According to 1st question - I think it is related to type of CPU which you have.
According to 2nd:
What do you mean ? Total Disk I/O is takes as far as I know from operating system.
regards,
Marcin
Post a Comment