Showing posts with label SASH. Show all posts
Showing posts with label SASH. Show all posts

Sunday, July 1, 2012

Average Active Session in SQL*plus with refresh

Recently when I hit performance issues and figure out that OEM agent is mis-configured for that host I wish I have script to display live Average Active Session in SQL*Plus. Of course there is a plenty of other great tools like Tanel Poder’s Snapper or Tanel and Adrian Billington MOATS.
MOATS could be a answer for my needs but it required some objects to be created in database. From other side Snapper is using dynamic objects only but it is not displaying history so I can’t see at a glance if system performance has been improved or not.
I decided to answer my needs and I have created SQL*Plus script displaying AAS history using only dynamic objects like Snapper. I would like to thanks Tanel and Adrian for inspiration how to build SQL*Plus active output scripts.

Here is a sample screenshot.


Average Active Session is calculated based on v$session sampling and output is divided into three event category – CPU, DISK I/O and OTHER. Technically it is possible to add more classes but it become more tricky to read it from screen – so I think these three are a good balance between knowing what is going on and visibility. For deeper investigation you can use Snapper.

This tool is using two scripts (both have to be in one directory):
  • runtopaas.sql - is a main script to parse run attributes and specify a run environment for topaas.sql script. It is calling topaas.sql 100 times
  • topaas.sql - is sampling v$session every 1 s for time specified in refresh rate parameters and keep it in PL/SQL collection.  At the end AAS (divided into 3 sections: CPU, Disk I/O and other) is calculated and displayed on screen.  In addition to that AAS results are added to bind variables together with sample time. When topaas.sql is called next time it is reading data from bind variables and it allow it to have history of AAS from past and display it on screen. Default configuration allow to display 100 data point
How to use it:
  • Change SQL*Plus window / screen terminal to 45 characters height and 150 characters wide
  • Run in SQL*Plus window:
    • SQL> @runtopaas.sql aas:refresh rate  - it will specify refresh rate (ex. 15 s) and with 100 samples it allow to keep 25 min of AAS in SQL*Plus window.If script will be started again after 100 cycles or after user break in this same session it will still be able to display historical data
    • SQL> @runtopaas.sql aas:refresh rate:reset - like above but historical data are cleared
    • SQL> @runtopaas.sql aas:refresh rate:max aas - like above but maximum value of AAS (y axis) is set by user
    • SQL> @runtopaas.sql aas:refresh rate:max aas:reset - like above but historical data are cleared
Examples:

SQL> @runtopaas aas:15


Yes there was a problem - AAS around 600 is not a normal one. When issue has been fixed I want to reset historic data and run script again


SQL> @runtopaas aas:15:reset



after some time 



You can download both scripts here - Github repo.  


Let me know if this tool is useful for you or you if you found any problems with it.
regards,
Marcin

Sunday, November 27, 2011

New release of S-ASH v.2.3

New version of Oracle Simulate ASH is ready to download and test. You can find it here - sash-v2.3
New features:
- new easier installation process
- new metrics history (IO and system)
- improved support for RAC / multi database in one repository
- Oracle scheduler is used to manage jobs

This version of OraSASH has been tested with Oracle 10g and 11g R1 and R2 as a target database and Oracle 11gR2 Express Edition as repository.

regards,
Marcin

Sunday, April 3, 2011

SASH visualization using Excel

OraSASH is a free implementation of Oracle Active Session History mechanism but it is providing only raw tables with data. To help other people use that functionality and inspired by Charles Hooper blog I have created a Excel based simple GUI interface for SASH. It is providing historical and real time view based on repository. It has been tested on Windows 7 64 bit with 32 bit Excel 2007. To make it running it is necessary to install and configure Oracle ODAC 11.2.0.2. Zip file can be downloaded from SourceForge repository. Please be aware that this is alpha stage of this Excel sheet and it can contain errors. Read only access is needed to OraSASH repository. Macros have to be enabled in Excel as all functionality is based on Visual Basic.

Setup sheet
Fill out repository user and tns alias name on Setup sheet and click Start button.


Historic data should be read from repository for whole available period of time and available instances list will be fill out as well. When all data will be collected active sheet will be switched to Graph sheet.
Graph sheet

There are two charts on Graph sheet – upper one is displaying Average Active Session - for AAS concept see John Beresniewicz presentation or Kyle Hailey blog.
Chart on the right side is displaying SQL activity for selected period of time. Percent of activity is calculated against number of all samples from that time. This same activity percent can be seen on OEM performance tab. SQL details like query text and plan will be displayed in Detail SQL view when chart bar for appropriate SQL ID will be clicked.
Historic view
To change period displayed on screen change start and stop date and then click Refresh Historic View button. 
Real time view
To see current database activity switch radio button to Real Time refresh. Refresh rate can be change on this same sheet in Refresh Rate field. It take one full refresh cycle to pick up a refresh new value.
Detail SQLsheet

Separate sheet for every SQL query. Sheet name is SQL_ID and detail available information about query text and plan will be displayed there. In addition to that detail information about wait event and CPU utilization will be displayed as well.

If you have any comments or questions please feel free to contact me.
regards,
Marcin

Wednesday, September 8, 2010

New version of Simulating ASH

Hello, 

I have mentioned some time ago about using Simulating ASH to solve performance issue in Oracle 9i database.
According to some restrictions I have to customize it a little bit and than I agreed with Kyle Hailey that I will publish it as a new version of S-ASH. A new project has been created on SourceForge to keep a repository and this new code and installation instruction can be find here.

Please feel free to post any comments or remarks. I have some idea about potential enhancement of that tool but I'm open for any new too.

regards,
Marcin

Sunday, July 18, 2010

OEM performance tab and Active Session History data.

Graphic representation of Active Session History data in Database Console or in Grid Control is causing some misunderstanding of how to read it. I have lot of questions about Y scale on OEM performance tab and now I decide to blog about it. I want to describe now OEM is using a ASH date and what we can do with this information. Anyway using SQL instead of OEM is more useful in performance tuning and can be easily focused on existing issue.

For other point of view knowledge of ASH give you opportunity to use free implementation of it developed by Kyle Hailey. (Thanks Kyle for your excellent scripts – this is great staff to learn from). By the way recently I have deployed SASH on big production database running on Oracle 9i and it help me to identify and address performance problems. As soon as I finish all these work I will blog about it too.

CPU utilization
CPU usage on performance tab is in green and in my case it has upper limit set to 2 – as I have two core CPU.


It is quite clear how to read CPU usage and what is a maximum CPU utilization. Scope of time which sessions can spend on CPU has an upper limit which is 100 % utilization of all CPU. It mean that maximum CPU time for session in sampling time can be calculated using following formula

Number of CPU x Sampling Time

So if we want to calculate a CPU usage we need to use following formula:

time consumed per all sessions / (Number of CPU x Sampling Time) * 100 %

ex.
time consumed per all sessions = 16 s
Number of CPU = 2
Sampling time = 15 s 

Util of all CPUs = 15 / (2 * 15) * 100 % = 53.3 %

How to calculate CPU utilisation using Active Session History data instead of OEM performance tab ?

Active Session History table is a results of sampling V$SESSION view every second. That mean there is no “real” waiting or executing time in V$ACTIVE_SESSION_HISTORY but just samples. We are assuming that every state in ASH view (WAITING or ON CPU) took exactly 1 s. This is important as we don’t sum waiting time from ASH view but we have to count how many 1 s events were in observed time.
Be aware that DBA_HIST_ACTIVE_SESS_HISTORY has only 1/10 data of  V$ACTIVE_SESSIN_HISTORY (only sample_id modulo 10 are copied into it) and each sample is now 10s long if we want to use it for our calculations.

Following query will calculate average CPU utilization for two minutes:
SQL> select trunc(count(*)/
  2  (min(p.value)*(max(sample_id) - min (sample_id)))*100,2) "CPU_UTIL",
  3  max(sample_id) - min (sample_id) "sampling time [s]",
  4  count(*) "time consumed per sessions [s]",
  5  min(p.value) * (max(sample_id) - min (sample_id)) "maximum possible time [s]"
  6  from v$active_session_history,
  7  (select value from v$parameter  where name = 'cpu_count') p
  8  where 1=1
  9  and sample_time >= to_date('18-07-2010 12:23:00','dd-mm-yyyy hh24:mi:ss')
 10  and sample_time <= to_date('18-07-2010 12:25:00','dd-mm-yyyy hh24:mi:ss')
 11  and session_state = 'ON CPU';

  CPU_UTIL sampling time [s] time consumed per sessions [s] maximum possible time [s]
---------- ----------------- ------------------------------ -------------------------
     29.23               118                             69                       236


But this doesn’t help us in understanding OEM graph. Refresh rate is set to 15 sec so we need to display data from that period split into 15 sec slices.
SQL> select trunc(to_char(sample_time,'SSSSS')/15) "15s_sample", trunc(count(*)/
  2  (min(p.value)*15)*100,2) "CPU_UTIL",
  3  count(*) "time consumed per sessions [s]"
  4  from v$active_session_history,
  5  (select value from v$parameter  where name = 'cpu_count') p
  6  where 1=1
  7  and sample_time >= to_date('18-07-2010 12:23:00','dd-mm-yyyy hh24:mi:ss')
  8  and sample_time <= to_date('18-07-2010 12:25:00','dd-mm-yyyy hh24:mi:ss')
  9  and session_state = 'ON CPU'
 10  group by trunc(to_char(sample_time,'SSSSS')/15)
 11  order by 1;

15s_sample   CPU_UTIL time consumed per sessions [s]
---------- ---------- ------------------------------
      2972       6.66                              2
      2973       6.66                              2
      2974      53.33                             16
      2975      23.33                              7
      2976      36.66                             11
      2977      26.66                              8
      2978      33.33                             10
      2979      43.33                             13

8 rows selected.

Now if you use any graphic tool to create a chart using 15s_sample column as X axis and "CPU util" column as Y axis you should get very similar picture to OEM chart.

Wait events

There is no easily calculated upper limit for waiting sessions. Why ? Number of waits per sample is depended on session state and number of waiting sessions. So what is a maximum for that ? Should we use only active sessions or all connected sessions ? The worst case scenario is all sessions are active and all are in waiting state. Both values multiplied each other give us a potential upper limit. 

Performance tab in OEM is calculating a average of waits in sample time divided by groups and all groups are displayed using stacked area graph.

Following SQL return data used to create following graphs
SQL> select trunc(to_char(sample_time,'SSSSS')/15) "15s_sample", wait_class, count(*)/15
  2  from v$active_session_history
  3  where sample_time >= to_date('18-07-2010 12:23:00','dd-mm-yyyy hh24:mi:ss')
  4  and sample_time <= to_date('18-07-2010 12:25:00','dd-mm-yyyy hh24:mi:ss')
  5  and session_state = 'WAITING'
  6  group by trunc(to_char(sample_time,'SSSSS')/15), wait_class order by 1,3
  7  ;

15s_sample WAIT_CLASS                                                       COUNT(*)/15
---------- ---------------------------------------------------------------- -----------
      2974 Other                                                                     .2
      2974 System I/O                                                        .533333333
      2974 Commit                                                                     2
      2974 User I/O                                                          2.06666667
      2974 Concurrency                                                       4.66666667
      2975 Application                                                       .066666667
      2975 System I/O                                                        .933333333
      2975 User I/O                                                                 3.8
      2975 Commit                                                            4.93333333
      2976 Concurrency                                                       .066666667
      2976 Application                                                       .066666667
      2976 Other                                                                     .4
      2976 System I/O                                                               1.2
      2976 Commit                                                                   5.4
      2976 User I/O                                                                 5.8
      2977 System I/O                                                        .866666667
      2977 User I/O                                                          2.93333333
      2977 Commit                                                                   5.6
      2978 Concurrency                                                       .066666667
      2978 System I/O                                                        .933333333
      2978 User I/O                                                          3.26666667
      2978 Commit                                                                     4
      2979 System I/O                                                        .866666667
      2979 User I/O                                                          1.46666667
      2979 Commit                                                            2.53333333

25 rows selected.
As results we have average time spend by all sessions divided per wait class for a 1 second.  What it mean that in sample 2976 wait class “Commit” average time was 5.4 ? It exactly mean that in that 15 seconds sampling time all active sessions were 5.4*15 = 81 times waiting for events in commit class. Is it bad or not ? This is a different question and will not be answered in that post.

Now if we take a look on sample nr 2976
2976 Application                                                       .066666667
2976 Other                                                                     .4
2976 System I/O                                                               1.2
2976 Commit                                                                   5.4
2976 User I/O                                                                 5.8

Sum of all waits give us 12.8 and this is more less what we can see as a top value on graph. A differences between graph and our calculation are related to observation period start time and how we split results into 15 sampling time. We are using a group by function based on sample_time truncated into 4 periods of seconds - using group by to_char function (based on Kyle idea).These periods are: 0 – 15, 15- 30 , 30-45, 45-60 but OEM can display data based on different 15 sec groups ex. 10 - 25, 25 - 40 so small differences can occure.

Now the question is – is this a bad thing to have high numbers of OEM graph ? And like always when we are talking about Oracle performance an answer is depends. 
What is bad for sure if our CPU usage is going over 85-90 % and stay there for a long time. In that case CPU response time will have good opportunity to increase and whole system performance can go down. What to do with waits ? In my opinion a best way is to create a baseline (even on paper) with average waiting time and react only when system is working over our baseline or what is more important when system users are claiming a performance problems.

Next time I will use drill-into functionality of OEM and will try to explain what other graphs means – unfortunately most of them are changing a scale between main performance graph and detailed ones so if you are not aware of that it can cause a lot of misunderstandings.


Regards,
Marcin