Analyzing a Statspack Report
A
guide to the detail
sections of the Statspack report
Wait
Events
Quick
Reference
Guide
Introduction
Executing Snapshots
Load Profile Section
Top 5 Timed Events Section
Resolving Event Waits Section
Common
Wait
Events,
Explanation
SQL Information Section
Instance Activity Stats
Section
Tablespace I/O Stats Section
Buffer cache Activity
Information
Enqueue Activity
Undo Segment Information
Latch Information
Library Cache Statistics
Advisory Statistics
Segment Statistics
Some Useful Queries
Creating Excel Reports
from Statspack
Web
tools to analyze Reports
A
guide to the detail sections of the Statspack report
Section(s) |
What You Can Use the Section(s) for |
Wait Events |
Look for excessive waits and wait times; drill down to specific problems |
SQL Ordered by Buffer Gets, Physical Reads, and Rows Processed |
Figure out which SQL statements to tune |
Instance Activity Statistics |
Compare with baseline report; compute additional statistics |
Tablespace and File I/O |
Investigate I/O bottlenecks, identify files and tablespaces with heavy I/O |
Buffer Pool |
Identify specific buffer pools with high contention or I/O |
Buffer Wait Statistics |
Identify types of buffers with large number of buffer waits |
Enqueue Activity |
Investigate specific lock types that are causing the most waits |
Rollback Segment Statistics and Storage |
Investigate waits for rollback segment headers |
Latch Activity, Latch Sleep Breakdown, Latch Miss Sources |
Identify latching bottlenecks; diagnose and related problems |
Library Cache |
Diagnose problems with shared pool |
Non-default |
Look for unnecessary or problematic parameter definitions |
Wait Events
Quick Reference Guide
Wait Problem |
Potential Fix |
DB File Scattered Read |
Wait for Multi-block read of a table or index (full scan): tune the code and/or cache small tables. |
DB File Sequential Read |
Wait for single block read of a table or index. Indicates many index reads: tune the code (especially joins). |
DB File parallel read | Used when Oracle performs in parallel reads from multiple datafiles to non-contiguous buffers in memory (PGA or Buffer Cache). Similar to db file sequential read |
Free Buffer |
Increase the DB_CACHE_SIZE; shorten the checkpoint; tune the code. |
Buffer Busy |
Segment header: add freelists or freelist groups. |
Buffer Busy |
Data block: separate "hot" data; use reverse key indexes and/or smaller blocks. |
Buffer Busy |
Data block: increase initrans and/or maxtrans. |
Buffer Busy |
Undo header: add rollback segments or areas. |
Buffer Busy |
Undo block: commit more often; use larger rollback segments or areas. |
Latch Free |
Investigate the latch detail. |
Enqueue-ST |
Use LMTs or preallocate large extents. |
Enqueue-HW |
Preallocate extents above high-water mark. |
Enqueue-TX4 |
Increase initrans and/or maxtrans on the table or index. |
Enqueue-TM |
Index foreign keys; check application locking of tables. |
Log Buffer Space |
Increase the log buffer; use faster disks for the redo logs. |
Log File Switch |
Archive destination slow or full; add more or larger redo logs. |
Log File Sync |
Commit more records at a time; use faster redo log disks or raw devices. |
Direct Path Read | Used by Oracle when reading directly into PGA (sort or hash) |
Direct Path Write | Used by Oracle when writing
directly into PGA (sort or hash) |
Idle Event |
Ignore it. |
Introduction
If you could choose just two Oracle
utilities to find and monitor performance problems in your Database
system, those two utilities would be Oracle Enterprise Manager
and Statspack. Which area of the Summary
page you will focus will depend on whether you are investigating
a performance problem on monitoring the load of changes, you should
start
checking the top 5 wait events section.
Check
Regularly
1. Top 5 wait events
2. Load Profile
3. Instance Efficiency Hit Ratios
4. Wait Events / Wait Event Histograms
5. Latch Waits
6. Top SQL
7. Instance Activity / Time Model Stats /
O/S Stats
8. File I/O / File Read Histogram / Undo
Stats
9. Memory Allocation
10. Undo
The STATSPACK report shows statistics ONLY
for the node or instance on which it was run.
Run statspack.snap procedure and spreport.sql script on each node you
want to monitor to compare to other instances.
Single-instance tuning should be performed before attempting to tune
the processes that communicate via the cluster interconnect
When statistics and wait events can be misleading
There are certain checks which can be performed to help identify
whether a statistic or event is really of interest. When timed_statistics
is false, wait events are ordered by the number of waits. This
information
may indicate which events are of interest, however it may be
misleading.
An event may be waited for a large number of times, however the wait
time (if it were available for comparison) may show the actual time
waited
is small despite the high count, hence the event is not really of
interest.
If wait time is available, a useful comparison can be made by taking
the total wait time for an event, and comparing it to the elapsed time
between snapshots. For example, if the wait event accounts for only 30
seconds out of a two hour period, there is probably little to be gained
by investigating this event. However, if the event accounts for 30
minutes
of a 45 minute period, the event may be worth investigating. There is
a warning here, too: even an event which had a wait of 30 minutes in
a 45 minute snapshot may not be indicative of a problem, when you take
into account there were 2000 users on the system, and the host hardware
was a 64 node machine.
When interpreting computed statistics (such as percentages, or
per-second
rates), it is important to cross-verify the computed statistic with the
actual statistic counts. This acts as a sanity check to determine
whether
the derived rates are really of interest. On initial examination, a
soft-parse
ratio of 50% would normally indicate a potential tuning area. However
if the actual statistic counts are small, this would not be an area of
interest. For example, if there was one hard parse and one soft parse
during the Statspack interval, the soft-parse ratio would be 50%, even
though the statistic counts show this is not an area of concern.
SNAPSHOT LEVELS
LEVEL 0 - GENERAL PERFORMANCE
This level can be used to gather general performance information about
the database.
LEVEL 5 - GENERAL PERFORMANCE + SQL STATEMENTS (DEFAULT)
This snapshot level will gather all the information from the previous
levels, plus it will collect performance data on high resource SQL
statements. This is also the default snapshot level when Statspack is
installed.
LEVEL 6 - GENERAL PERFORMANCE + SQL STATEMENTS + SQL PLANS AND
SQL PLAN USAGE
This level is new in Oracle9i and it will include all the information
collected from the previous snapshot levels, plus execution path and
plan usage information as they relate to high resource SQL statements.
This type of information can prove critical when determining if the
execution
path or plan has changed for high resource SQL statements. Oracle
recommends
using this level for when one of the following situations has occurred:
- A plan has possibly changed after large volumes of data have
been added.
- Obtaining new optimizer setting information.
LEVEL 10 - GENERAL PERFORMANCE + SQL STATEMENTS + SQL PLANS AND
SQL PLAN USAGE + PARENT AND CHILD LATCHES
This level will include all the information collected from previous
snapshot levels, plus the addition of parent and child latch
information.
This level will take even longer to complete since the parent and child
latch information are added to the duration of the previous 2 levels,
which are already information gathering intensive. First, because the
information gathered is based on the shared_pool_size and secondly the
volume of information gathered based on SQL statement information, plus
the parent and child latch information. Snapshots taken from this level
will take even longer and it is Oracle's recommendation to
only use this level when requested by Oracle technical support
personnel.
LEVEL SETTING RECOMMENDATION
It is recommended to set the timed_statistics to true BEFORE the first
snapshot because it will help to establish a better baseline, otherwise
another baseline will be needed AFTER it is turned on. This can be done
with the Alter SYSTEM command and/or setting it in the init.ora file.
SESSION SPECIFIC SNAPSHOT
Statspack also provides the capability to gather session specific
information. Passing the i_session_id value to the Statspack.snap
procedure
will enable this option.
The following is an example of using this feature:
SQL> EXECUTE STATSPACK.SNAP(i_session_id=>20);
EXECUTING SNAPSHOTS
Executing a snapshot interactively can be as easy as accessing SQL*Plus
as the PERFSTAT user and using the SNAPSHOT.SNAP command or automating
when a snapshot is executed. The interactive method is highly
beneficial
for when a problem is reported in the database and a snapshot could
prove
beneficial for troubleshooting, whereas the value of an automated
snapshot
is realized when a problem is reported at a later time and a comparison
needs to be made between two specific times that occurred in the past.
INTERACTIVE METHOD
Access SQL*Plus as the PERFSTAT user and execute either method 1, 2 or
3 as discussed in the above snapshot Configuration section. The
simplest form of the interactive mode is as follows:
SQL> EXECUTE STATSPACK.SNAP
AUTOMATED METHOD
The ability to automate a snapshot is another one of the great features
of the Statspack utility. Automating and scheduling when to take
snapshots
allows for the collection of database performance information that
would
be beneficial for troubleshooting performance problems that occurred
earlier. The following are two ways that snapshots can be automated:
- Oracle's DBMS_JOB utility to schedule snapshots. This utility will
be discussed in greater detail.
- An operating specific job scheduler. For example on Unix, shell
scripts
can be written and then scheduled through the CRON scheduler. For NT,
the AT scheduler in combination with .cmd files.
DBMS_JOB UTILITY
The DBMS_JOB utility provides a way to schedule database related tasks
that are controlled within the database. Through the DBMS_JOB utility
snapshots can be taken at a scheduled interval. When the spcpkg.sql
script
was executed as part of the Statspack installation, the DBMS_JOB
package
was created for the PERFSTAT user. One of the requirements to use the
DBMS_JOB utility is that the init.ora parameter job_queue_processes
must
be set to a value greater than 0. The spauto.sql script is designed to
setup the automation of executing snapshots once every hour. The
following
line from the script is how the job is added to the schedule:
dbms_job.submit(:jobno, 'statspack.snap;', trunc(sysdate+1/24,'HH'), -
'trunc(SYSDATE+1/24,''HH'')', TRUE, :instno);
The benefits of using the spauto.sql script is that it:
- Displays the job number assigned
- Identifies the number of job_queue_processes set for the database
- The next time that the snapshot will occur
The Load Profile section of the Statspack report is useful
primarily in comparing two reports to see if the load characteristics
from the two report periods are similar. In proactive tuning, you
generate reports on a routine basis and check the Load Profile for
changes in throughput (shown by the per-second statistics) and changes
in application
characteristics (shown by the per-transaction statistics). In reactive
tuning, you use the Load Profile to verify the validity of comparing
a report generated during a problem period with a report generated
during
a baseline period. Make sure the systems were running comparable
workloads
during the two report periods. For example, if one report showed a
majority
of read-only activity and the second was very update-intensive,
comparing
the two reports would not be valid.
If you are not comparing two reports, it's still a good idea to
scan the Load Profile for any rates that seem high, irrespective of a
baseline. For example, a high hard-parse rate (say, greater than 100
per second) may have serious implications for performance. High
hard-parse
rates are likely to be accompanied by latch contention, so you would
expect to see latch free waits in the Top 5 Wait Events or high in the
complete Wait Events list.
Load Profile
~~~~~~~~~~~~
Per
Second
Per
Transaction
---------------
---------------
Redo
size:
351,530.67
7,007.37
Logical
reads:
5,449.81
108.64
--Key
Metric
Block
changes:
1,042.08
20.77
Physical
reads:
37.71
0.75 --Key Metric
Physical
writes:
134.68
2.68 --Key Metric
User
calls:
1,254.72
25.01
Parses:
4.92
0.10
Hard
parses:
0.02
0.00
Sorts:
15.73
0.31
Logons:
-0.01
0.00
Executes:
473.73
9.44 --Key
Metric
Transactions:
50.17
--Key Metric
% Blocks changed per Read:
19.12
Recursive Call %: 4.71
Rollback per transaction %:
2.24
Rows per Sort: 20.91
Where:
. Redo size: This is
the amount of redo generated during this
report. If you see an increase here then more DML statements are taking
place (meaning your users are doing more INSERTs, UPDATEs, and DELETEs
than before
. Logical Reads: This is
calculated as Consistent Gets + DB Block
Gets = Logical Reads
. Block changes: The number
of blocks modified during the sample
interval. If you see an increase here then more DML statements are
taking place (meaning your users are doing more INSERTs, UPDATEs, and
DELETEs than before
. Physical Reads: The
number of requests for a block that caused a
physical I/O.
. Physical Writes: The
number of physical writes issued.
. User Calls: The number of
queries generated
. Parses: Total of all
parses: both hard and soft
. Hard Parses: Those parses
requiring a completely new parse of the
SQL statement. A
‘hard parse’ rate of greater than 100 per second indicates there is a
very high amount of hard parsing on the system. High hard parse rates
cause serious performance issues, and must be investigated. A high hard
parse rate is usually accompanied by latch contention on the shared
pool and library cache latches. Check whether waits for ‘latch free’
appear in the top-5 wait events, and if so, examine the latching
sections of the Statspack report. Of course, we want a low number here.
. Soft Parses: Not listed
but derived by subtracting the hard
parses from parses. A soft parse reuses a previous hard parse and
hence consumes far fewer resources. A high soft parse rate could be
anywhere in the rate of 300 or more per second. Unnecessary soft parses
also limit application scalability; optimally a SQL statement should be
soft-parsed once per session, and executed many times.
. Sorts and Logons are
all
self
explanatory
. Executes: how many
statements we are executing per second / transaction
. Transactions: how
many transactions per second we process
This
gives
an
overall view of the load on the
server. In this case, we are looking at a very good hard parse
number and a fairly high system load.
The per-second statistics show you the changes in throughput (i.e.
whether the instance is performing more work per second). For example:
• a significant increase in ‘redo size’, ‘block changes’ and ‘pct of
blocks changed per read’ would indicate the instance is performing more
inserts/updates/deletes.
• an increase in the ‘redo size’ without an increase in the number of
‘transactions per second’ would indicate a changing transaction profile.
Similarly, looking at the per-transaction statistics allows you to
identify changes in the application characteristics by comparing these
to the corresponding statistics from the baseline report.
Sample
Analisys
Load Profile
~~~~~~~~~~~~
Per
Second
Per
Transaction
---------------
---------------
Redo
size:
1,316,849.03
6,469.71
Logical
reads:
16,868.21
82.87
Block
changes:
5,961.36
29.29
Physical
reads:
7.51 (B)
0.04
Physical
writes:
1,044.74 (B)
5.13
User
calls:
8,432.99 (C)
41.43
Parses:
1,952.99 (D)
9.60
Hard
parses:
0.01 (D)
0.00
Sorts:
1.44
0.01
Logons:
0.05
0.00
Executes:
1,954.97
9.60
Transactions:
203.54
% Blocks changed per Read: 35.34 (A)
Recursive Call %: 25.90
Rollback per transaction %: 9.55
Rows per Sort: 137.38
Observations:
• This system is generating a lot of redo (1mb/s), with 35% of all
blocks read being updated. (A)
• Comparing the number of Physical reads per second to the number of
Physical writes per second shows the physical read to physical write
ratio is very low (1:49). Typical OLTP systems have a read-to-write
ratio of 10:1 or 5:1 - this ratio (at 1:49) is quite unusual. (B)
• This system is quite busy, with 8,432 User calls per second.(C)
• The total parse rate (Parses per second) seems to be high, with the
Hard parse rate very low, which implies the majority of the parses are
soft parses. The high parse rate may tie in with the latch free event,
if the latch contended for is the library cache latch, however no
assumptions should be made. (D)
On the whole, this seems to be a heavy workload, with many parses, and
writes.
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.99
Redo NoWait %: 100.00
Buffer Hit %: 95.57
In-memory
Sort %: 97.55
Library Hit
%:
99.89
Soft Parse %: 99.72
Execute
to
Parse
%:
88.75
Latch
Hit
%: 99.11
Parse CPU to Parse Elapsd %:
52.66
% Non-Parse CPU: 99.99
Shared Pool
Statistics
Begin End
------
------
Memory
Usage
%:
42.07 43.53
% SQL with executions>1:
73.79
75.08
% Memory for SQL w/exec>1: 76.93
77.64
Interpreting the ratios in this section can be slightly more complex
than it may seem at first glance. While high values for the ratios are
generally good (indicating high efficiency), such values can be
misleading your system may be doing something efficiently that
it would be better off not doing at all. Similarly, low values aren't
always bad. For example, a low in-memory sort ratio (indicating a low
percentage of sorts performed in memory) would not necessarily be a
cause
for concern in a decision- support system (DSS) environment, where user
response time is less critical than in an online transaction processing
(OLTP) environment.
Basically, you need to keep in mind the characteristics of your
application - whether it is query-intensive or update-intensive,
whether
it involves lots of sorting, and so on - when you're evaluating the
Instance
Efficiency Percentages.
It is possible for both the 'buffer hit ratio' and the 'execute to parse' ratios to be negative. In the case of the buffer hit ration, the buffer cache is too small and the data in is being aged out before it can be used so it must be retrieved again. This is a form of thrashing which degrades performance immensely.
Buffer Nowait Ratio.
This is the percentage of time that the instance made a call to get a
buffer (all buffer types are included here) and that buffer was made
available immediately (meaning it didn't have to wait for the
buffer...hence "Buffer Nowait"). If the ratio is low, then could be a
(hot) block(s) being contended for that should be found in the Buffer
Wait Section.. If the ratio is low,
check the Buffer Wait Statistics section of the report for more detail
on which type of block is being contended for.
Buffer Hit Ratio. (also known as the buffer-cache hit
ratio) Ideally more than 95 percent. It shows the % of times a
particular block was found in buffer cache insted of performing a
physical I/O (reading from disk).
Although historically
known as one of the most important statistics to evaluate, this ratio
can sometimes be misleading. A low buffer hit ratio does not
necessarily
mean the cache is too small; it may be that potentially valid
full-table
scans are artificially reducing what is otherwise a good ratio.
Similarly,
a high buffer hit ratio (say, 99 percent) normally indicates that the
cache is adequately sized, but this assumption may not always be valid.
For example, frequently executed SQL statements that repeatedly
refer to a small number of buffers via indexed lookups can create a
misleadingly
high buffer hit ratio. When these buffers are read, they are placed at
the most recently used (MRU) end of the buffer cache; iterative access
to these buffers can artificially inflate the buffer hit ratio. This
inflation makes tuning the buffer cache a challenge. Sometimes you can
identify a too-small buffer cache by the appearance of the write
complete
waits event, which indicates that hot blocks (that is, blocks that are
still being modified) are aging out of the cache while they are still
needed; check the Wait Events list for evidence of this event.
Library Hit Ratio. This ratio, also known as the library-cache
hit ratio, gives the percentage of pin requests that result in pin
hits. A pin hit occurs when the SQL or PL/SQL code to be executed is
already in the library cache and is valid to execute. If the "Library
Hit ratio" is low, it could be indicative of a shared pool that is too
small (SQL is prematurely pushed out of the shared pool), or just as
likely, that the
system did not make correct use of bind variables in the application.
If the soft parse
ratio is also low, check whether there's a parsing issue. A lower ratio
could also indicate that bind variables are not used or some other
issue is causing SQL not to be reused (in which case a smaller shared
pool may only be a band-aid that will potentially fix a library latch
problem which may result).
Execute to Parse. If value
is negative, it means that the number of parses
is larger than the number of executions. Another cause for a negative
execute to parse ratio is if the shared
pool is too small and queries are aging out of the shared pool and need
to be reparsed. This is another form of thrashing which also
degrades
performance tremendously. So, if you run some SQL and it has to be
parsed every time you execute it (because no plan exists for this
statement) then your percentage would be 0%. The more times that your
SQL statement can reuse an existing plan the higher your Execute to
Parse ratio is. This is very BAD!! One way to increase your parse ratio
is to use bind variables.
Parse CPU to Parse Elapsd %: Generally, this is a measure of
how available your CPU cycles were for SQL parsing. If this is low, you
may see "latch free" as one of your top wait events.
Redo Nowait Ratio. This ratio indicates the amount of redo
entries generated for which there was space available in the redo log.
The instance didn't have to wait to use the redo log if this is 100%
The redo-log space-request statistic is incremented when an Oracle
process attempts to write a redo-log entry but there is not sufficient
space remaining in the online redo log. Thus, a value close to 100
percent
for the redo nowait ratio indicates minimal time spent waiting for redo
logs to become available, either because the logs are not filling up
very often or because the database is able to switch to a new log
quickly
whenever the current log fills up.
If your alert log shows that you are switching logs frequently
(that is, more than once every 15 minutes), you may be able to reduce
the amount of switching by increasing the size of the online redo logs.
If the log switches are not frequent, check the disks on which the redo
logs reside to see why the switches are not happening quickly. If these
disks are not overloaded, they may be slow, which means you could put
the files on faster disks.
In-Memory Sort Ratio. This ratio gives the percentage of sorts
that were performed in memory, rather than requiring a disk-sort
segment to complete the sort. Optimally, in an OLTP environment, this
ratio should be high. Setting the PGA_AGGREGATE_TARGET (or
SORT_AREA_SIZE) initialization parameter effectively will eliminate
this problem, as a minimum you pretend to have this one in 95%
Soft Parse Ratio. This ratio gives the percentage of parses that
were soft, as opposed to hard. A soft parse occurs when a session
attempts to execute a SQL statement and a usable version of the
statement
is already in the shared pool. In other words, all data (such as the
optimizer execution plan) pertaining to the statement in the shared
pool
is equally applicable to the statement currently being issued. A hard
parse, on the other hand, occurs when the current SQL statement is
either
not in the shared pool or not there in a shareable form. An example of
the latter case would be when the SQL statement in the shared pool is
textually identical to the current statement but the tables referred
to in the two statements resolve to physically different tables.
Hard parsing is an expensive operation and should be kept to a
minimum in an OLTP environment. The aim is to parse once, execute many
times.
Ideally, the soft parse ratio should be greater than 95 percent.
When the soft parse ratio falls much below 80 percent, investigate
whether you can share SQL by using bind variables or force cursor
sharing by using the init.ora
parameter cursor_sharing
.
Before you jump to any conclusions about your soft parse ratio,
however, be sure to compare it against the actual hard and soft parse
rates shown in the Load Profile. If the rates are low (for example, 1
parse per second), parsing may not be a significant issue in your
system.
Another useful standard of comparison is the proportion of parse time
that was not CPU-related, given by the following ratio:
(parse time CPU) / (parse time elapsed)
A low value for this ratio could mean that the non-CPU-related
parse time was spent waiting for latches, which might indicate a
parsing or latching problem. To investigate further, look at the
shared-pool and library-cache latches in the Latch sections of the
report for indications
of contention on these latches.
Latch Hit Ratio. This is the ratio of the total number of latch
misses to the number of latch gets for all latches. A low value for
this ratio indicates a latching problem, whereas a high value is
generally good. However, as the data is rolled up over all latches, a
high latch hit ratio can artificially mask a low get rate on a specific
latch. Cross-check this value with the Top 5 Wait Events to see if
latch
free is in the list, and refer to the Latch sections of the report.
Latch Hit % of less than 99 percent is usually a big problem.
Also check the "Shared Pool Statistics", if the "End" value is in
the high 95%-100% range ,this is a indication that the shared pool
needs to be increased (especially if the "Begin" value is much smaller)
% SQL with executions>1:
Shows % of SQLs executed more than 1 time. The % should be very near to
value 100.
% memory for SQL w/exec>1:
From the memory space allocated to cursors, shows which % has been used
by cursors more than 1.
The ratio above 80% is always healthy.
Sample
Analysis
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 98.56
Redo NoWait %: 100.00
Buffer Hit %: 99.96
In-memory
Sort %: 99.84
Library Hit
%:
99.99
Soft Parse
%: 100.00 (A)
Execute to Parse
%: 0.10 (A)
Latch
Hit
%:
99.37
Parse CPU to Parse Elapsd
%:
58.19 (A)
% Non-Parse CPU: 99.84
Shared Pool
Statistics
Begin End
------
------
Memory Usage %: 28.80
29.04 (B)
% SQL with executions>1:
75.91
76.03
% Memory for SQL w/exec>1: 83.65 84.09
Observations:
• The 100% soft parse ratio (A) indicates the
system is not hard-parsing. However the system is soft parsing a lot,
rather than only re-binding and re-executing the same cursors, as the
Execute to Parse % is very low (A). Also, the CPU
time used for parsing (A) is only 58% of
the total elapsed parse time (see Parse CPU to Parse Elapsd). This may
also imply some resource contention during parsing (possibly related to
the latch free event?).
• There seems to be a lot of unused memory in the shared pool (only 29%
is used) (B).
If
there
is
insufficient memory allocated to other areas of the
database (or OS), this memory could be redeployed
***Please see the following NOTES on shared
pool issues
[NOTE:146599.1]
Diagnosing and Resolving Error ORA-04031
[NOTE:62143.1]
Understanding and Tuning the Shared Pool
[NOTE:105813.1]
SCRIPT TO SUGGEST MINIMUM SHARED POOL SIZE
Just knowing the breakdown of time into the above 3 categories is not very useful so Oracle has a set of 'Wait Events' for activities in 'a' and 'c', and can record CPU utilization for 'b'. This is best illustrated with a simplified example of few seconds in the life of an Oracle shadow process:
State Notes...
~~~~~ ~~~~~~~~
IDLE Waiting for 'SQL*Net message from client'.
Receives a SQL*Net packet requesting 'parse/execute' of a statement
ON CPU decodes the SQL*Net packet.
WAITING Waits for 'latch free' to obtain the a 'library cache' latch
Gets the latch.
ON CPU Scans for the SQL statement in the shared pool, finds a match,
frees latch , sets up links to the shared cursor etc.. & begins to
execute.
WAITING Waits for 'db file sequential read' as we need a block which is
not in the buffer cache. Ie: Waiting for an IO to complete.
ON CPU Block read has completed so execution can continue.
Constructs a SQL*Net packet to send back to the user containing
the first row of data.
WAITING Waits on 'SQL*Net message to client' for an acknowledgement that the
SQL*Net packet was reliably delivered.
IDLE Waits on 'SQL*Net message from client' for the next thing to do.
This section is among the most important and relevant
sections in the Statspack report. Here is where you find out what
events (typically wait events) are consuming the most time. In Oracle9i
Release 2, this section includes a new event: CPU
time
When you are
trying to eliminate bottlenecks on your system, your Statspack report's
Top 5
Timed Events section is the first place to look. This section of
the report shows the top 5 wait events, the full list of wait events,
and the background wait events. If your system's TIMED_STATISTICS
initialization parameter is set to true, the events are ordered in time
waited, which is preferable,
since all events don't show the waits. If TIMED_STATISTICS is false,
the events are ordered by the number of waits.
Listing 1 shows a
large number of waits related to reading a single block (db file
sequential
read) as well as waits for latches (latch free). You can see in this
listing high waits for some of the writing to datafiles and log files.
To identify which of these are major issues, you must narrow down the
list by investigating the granular reports within other sections of
Statspack.
Code Listing 1: Statspack report showing
waits related to reading a single block
Top 5 Wait Events
------------------------------------
Event Waits Time (s) % Total Elap. Time
------------------------------------------------------------------------------
db file sequential read 18,977,104 22,379,571 82.29
latch free 4,016,773 2,598,496 9.55
log file sync 1,057,224 733,490 2.70
log file parallel write 1,054,006 503,695 1.85
db file parallel write 1,221,755 404,230 1.49
This section shows TOP 5 wait events the processes were waiting on
during the snapshot time period. These events are helpful during
analysis of any database related performance bottlenecks. The
- Wait Class, column helps in classifying whether
the issue is related to application or infrastructure.
- Waits, column provides information about no. of
wait happens.
- Time(s), column provides information about total
CPU time in seconds spent on the wait.
Important wait events and their causes are explained in detail below
Wait
Events
Information Section
The following section will describe in detail most of the sections
provided
in a statspack report.
- Foreground Wait Events: Foreground wait events
are those associated with a session or client
process waiting for a resource
- Background Wait Events: Background wait events are those not associated with a client process. They indicate waits encountered by system and non-system processes. Examples of background system processes are LGWR and DBWR. An example of a non-system background process would be a parallel query slave. Note that it is possible for a wait event to appear in both the foreground and background wait events statistics, for examples the enqueue and latch free events. The idle wait events appear at the bottom of both sections and can generally safely be ignored. Typically these type of events keep record of the time while the clien is connected to the database but not requests are being made to the server.
A critical activity in Database Performance
Tuning is Response Time Analysis: this consists of finding out where
time is being spent in a database. Response Time Analysis for an
Oracle Database is done using the
following equation:
Response Time = Service Time + Wait Time |
'Service Time'
is measured using the statistic 'CPU used by this session'
'Wait Time' is measured by summing up time spent on Wait Events
When presented with such a list of top Wait Events it sometimes
becomes easy to simply start dealing with the listed Wait Events and to
forget evaluating their impact on overall Response Time first. In
situations where 'Service Time' i.e. CPU usage is much more significant
than 'Wait Time', it is very likely that investigating Wait Events will
not produce significant savings in 'Response Time'. Therefore, one should always compare the time taken
by the top wait events to the 'CPU used by this session' and direct the
tuning effort to the biggest consumers.
Note:
To address this possible source of confusion, starting with Oracle9i
Release 2 the "Top 5 Wait Events" section has been renamed to "Top 5
Timed Events". Here, 'Service Time' as measured by the statistic 'CPU
used by this session' is listed as 'CPU time'. This means that it is
now easier to accurately measure the impact of Wait Events in overall
'Response Time' and to correctly target the subsequent tuning effort.
In this section we list the I/O-related Wait
Events that occur most
often in Oracle databases together with reference notes describing each
wait.
Datafile
I/O-Related
Wait
Events:
'db file sequential read' [NOTE:34559.1]
'db file scattered read' [NOTE:34558.1]
'db file parallel read'
'direct path read' [NOTE:50415.1]
'direct path write' [NOTE:50416.1]
'direct path read (lob)'
'direct path write (lob)'
Controlfile
I/O-Related
Wait
Events:
'control file parallel write'
'control file sequential read'
'control file single write'
Redo
Logging
I/O-Related
Wait
Events:
'log file parallel write' [NOTE:34583.1]
'log file sync' [NOTE:34592.1]
'log file sequential read'
'log file single write'
'switch logfile command'
'log file switch completion'
'log file switch (clearing log file)'
'log file switch (checkpoint incomplete)'
'log switch/archive'
'log file switch (archiving needed)'
Buffer
Cache
I/O-Related
Wait
Events:
'db file parallel write' [NOTE:34416.1]
'db file single write'
'write complete waits'
'free buffer waits'
If you want quick instancewide wait
event status, showing which events are the biggest contributors to
total wait time, you can use the following query :
select event,
total_waits,time_waited from V$system_event
where event NOT IN
('pmon timer', 'smon
timer', 'rdbms ipc reply', 'parallel deque wait',
'virtual circuit',
'%SQL*Net%', 'client message', 'NULL event')
order by time_waited desc;
EVENT
TOTAL_WAITS
TIME_WAITED
------------------------
-------------
-------------
db file sequential
read
35051309
15965640
latch
free
1373973
1913357
db file scattered
read
2958367
1840810
enqueue
2837
370871
buffer busy
waits
444743
252664
log file parallel
write
146221
123435
1. DB File Scattered Read.
That generally happens during many full scan of a table or
Fast Full Index Scans.
As full table scans are pulled into memory, they rarely
fall into contiguous buffers but instead are scattered throughout the
buffer cache. A large number here indicates that your table may have
missing indexes or your indexes are not used. Although it may be more
efficient in
your situation to perform a full table scan than an index scan, check
to ensure that full table scans are necessary when you see these waits.
Try to cache small tables to avoid reading them in over and over again,
since a full table scan is put at the cold end of the LRU (Least
Recently Used) list. You can use the Statspack report
to help identify the query in question and fix it.
The
init.ora parameter db_file_multiblock_read_count
specifies the maximum
numbers of blocks read in that way. Typically, this parameter should
have values of 4-16 independent of the size of the database but with
higher values needed with smaller Oracle block sizes. If you have a
high wait time for this event, you either need to reduce the cost of
I/O, e.g. by getting faster disks or by distributing your I/O load
better, or you need to reduce the amount of full table scans by tuning
SQL statements. The appearance of the‘db
file
scattered
read’
and ‘db
file
sequential read’events may not necessarily indicate a problem,
as IO is
a normal activity on a healthy instance. However, they can indicate
problems if any of the following circumstances are true:
• The data-access method is bad (that is, the SQL statements are poorly
tuned), resulting in unnecessary or inefficient IO operations
• The IO system is overloaded and performing poorly
• The IO system is under-configured for the load
• IO operations are taking too long
If this Wait Event is a significant portion
of Wait Time then a number of approaches are possible:
o Find which SQL statements perform Full Table or Fast Full Index scans
and tune them to make sure these scans are necessary and not the result
of a suboptimal plan.
- Starting with Oracle9i the new view V$SQL_PLAN view can help:
For Full Table scans:
select sql_text from v$sqltext t,
v$sql_plan p
where
t.hash_value=p.hash_value
and p.operation='TABLE ACCESS'
and
p.options='FULL'
order by p.hash_value,
t.piece;
For Fast Full Index scans:
select sql_text from v$sqltext t,
v$sql_plan p
where
t.hash_value=p.hash_value
and p.operation='INDEX'
and p.options='FULL
SCAN'
order by p.hash_value,
t.piece;
o In cases where such multiblock scans occur
from optimal execution plans it is possible to tune the size of
multiblock I/Os issued by Oracle by setting the instance parameter
DB_FILE_MULTIBLOCK_READ_COUNT so that:
DB_BLOCK_SIZE x DB_FILE_MULTIBLOCK_READ_COUNT = max_io_size of system
Query tuning should be used to optimize online SQL to use indexes.
2. DB File Sequential Read.
This could indicate poor joining
orders in your SQL or waiting for writes
to TEMP space generally (direct
loads, Parallel DML (PDML) such as parallel updates. It could mean that a lot of index reads/scans are
going on. Depending on the problem it may help to tune
PGA_AGGREGATE_TARGET and/or DB_CACHE_SIZE.
The
sequential read event identifies
Oracle reading
blocks sequentially, i.e. one after each other.. A large number of
waits here could
indicate
poor joining orders of tables, unselective indexing or you either
need
to reduce the cost of I/O, e.g. by getting faster disks or by
distributing your I/O load better, or you need to reduce the amount of
I/O by increasing the buffer cache or by tuning SQL statements. It is
normal
for this number to be large for a high-transaction, well-tuned system,
but it can indicate problems in some circumstances. You should
correlate
this wait statistic with other known issues within the Statspack
report,
such as inefficient SQL. Check to ensure that index scans are
necessary,
and check join orders for multiple table joins. The DB_CACHE_SIZE will
also be a determining factor in how often these waits show up.
Problematic
hash-area joins should show up in the PGA memory, but they're also
memory
hogs that could cause high wait numbers for sequential reads. They can
also show up as direct path read/write waits. These
circumstances
are usually interrelated. When they occur in conjunction with the
appearance
of the 'db file scattered read'
and 'db file sequential read'
in the Top
5 Wait Events section, first you should examine the SQL Ordered by
Physical Reads section of the report, to see if it might be helpful to
tune the statements with the highest resource usage.
It could be because the indexes are fragmented. If that is the case,
rebuilding the index will compact it and will produce to visit less
blocks.
Then, to determine
whether there is a potential I/O bottleneck, examine the OS I/O
statistics
for corresponding symptoms. Also look at the average time per read in
the Tablespace and File I/O sections of the report. If many I/O-related
events appear high in the Wait Events list, re-examine the host
hardware
for disk bottlenecks and check the host-hardware statistics for
indications
that a disk reconfiguration may be of benefit.
Block reads are fairly inevitable so the aim should be to minimize
unnecessary I/O. I/O for sequential reads can be reduced by tuning SQL
calls that result in full table scans and using the partitioning option
for large tables.
3. Free Buffer Waits. When a session needs a free buffer and cannot find
one, it will post the database writer process asking it to flush dirty
blocks. Waits in this category
may indicate that you need to increase
the
DB_BUFFER_CACHE, if all your
SQL is tuned. Free buffer waits could also indicate that unselective
SQL is causing data to flood the buffer cache with index blocks,
leaving
none for this particular statement that is waiting for the system to
process. This normally indicates that there is a substantial amount of
DML (insert/update/delete) being done and that the Database Writer
(DBWR) is not writing quickly enough; the buffer cache could be full of
multiple versions of the same buffer, causing great inefficiency. To
address this, you may want to consider accelerating
incremental
checkpointing,
using more DBWR processes, or increasing the number of physical disks.
To investigate if this is an I/O problem, look at the statspack file
I/O Statistics. Increase the DB_CACHE_SIZE; shorten the checkpoint;
tune the code to get less dirty blocks, faster I/O, use multiple
DBWR’s.
4. Buffer Busy Waits. A buffer busy wait happens when multiple processes
concurrently want to modify the same block in the buffer cache. This
typically happens during massive parallel inserts if your tables do not
have free lists and it can happen if you have too few rollback segments. Buffer busy
waits should not be greater than 1 percent. Check the Buffer
Wait
Statistics
section (or V$WAITSTAT) to find out if the wait is on a segment header.
If this is the case, increase the freelist groups or increase the
pctused to pctfree gap. If the wait is on an undo header, you can
address
this by adding rollback segments; if it's on an undo block, you need
to reduce the data density on the table driving this consistent read
or increase the DB_CACHE_SIZE. If the wait is on a data block, you can
move data to another block to avoid this hot block, increase the
freelists
on the table, or use Locally Managed Tablespaces (LMTs). If it's on an
index block, you should rebuild the index, partition the index, or use
a reverse key index. To prevent buffer busy waits related to data
blocks,
you can also use a smaller block size: fewer records fall within a
single
block in this case, so it's not as "hot." When a DML (insert/update/
delete) occurs, Oracle Database writes information into the block,
including all users who are "interested" in the state of the block
(Interested
Transaction List, ITL). To decrease waits in this area, you can
increase
the initrans, which will create the space in the block to allow
multiple
ITL slots. You can also increase the pctfree on the table where this
block exists (this writes the ITL information up to the number
specified
by maxtrans, when there are not enough slots built with the initrans
that is specified). Buffer busy waits can be reduced by using
reverse-key indexes for busy indexes and by partitioning busy tables.
Buffer Busy Wait on Segment
Header – Add freelists (if inserts) or freelist groups (esp.
RAC). Use ASSM.
Buffer Busy Wait on Data Block
– Separate ‘hot’ data; potentially use reverse key indexes; fix queries
to reduce the blocks popularity, use smaller blocks, I/O, Increase
initrans and/or maxtrans (this one’s debatable). Reduce records per
block
Buffer Busy Wait on Undo
Header – Add rollback segments or increase size of segment area
(auto undo)
Buffer Busy Wait on Undo block
– Commit more (not too much) Larger rollback segments/area. Try to fix
the SQL.
5. Latch Free. Latches are low-level queuing mechanisms (they're
accurately referred to as mutual exclusion mechanisms) used to protect
shared memory structures in the system global area (SGA). Latches are
like locks on memory that are very quickly obtained and released.
Latches
are used to prevent concurrent access to a shared memory structure. If
the latch is not available, a latch free miss is recorded. Most latch
problems are related to the failure to use bind variables (library
cache latch), redo generation issues (redo allocation latch), buffer
cache contention issues (cache buffers LRU chain), and hot blocks in
the buffer cache (cache buffers chain). There are also latch waits
related to bugs; check MetaLink for bug reports if you suspect this is
the case. When latch miss ratios are greater than 0.5 percent, you
should
investigate the issue. If
latch free waits are in the Top 5 Wait Events or high in the complete
Wait Events list, look at the latch-specific sections of the Statspack
report to see which latches are contended for.
6. Enqueue. An enqueue is a lock that protects a shared
resource.
Locks protect shared resources, such as data in a record, to prevent
two people from updating the same data at the same time application,
e.g. when a select for update is executed.. An enqueue
includes
a queuing mechanism, which is FIFO (first in, first out). Note that
Oracle's latching mechanism is not FIFO. Enqueue waits usually point to
the ST enqueue, the HW enqueue, the TX4 enqueue, and the TM enqueue.
The ST enqueue is used for space management and allocation for
dictionary-managed
tablespaces. Use LMTs, or try to preallocate extents or at least make
the next extent larger for problematic dictionary-managed tablespaces.
HW enqueues are used with the high-water mark of a segment; manually
allocating the extents can circumvent this wait. TX4s are the most
common
enqueue waits. TX4 enqueue waits are usually the result of one of three
issues. The first issue is duplicates in a unique index; you need to
commit/rollback to free the enqueue. The second is multiple updates to
the same bitmap index fragment. Since a single bitmap fragment may
contain multiple rowids, you need to issue a commit or rollback to free
the enqueue when multiple users are trying to update the same fragment.
The third and most likely issue is when multiple users are updating the
same block. If there are no free ITL slots, a block-level lock
could
occur. You can easily avoid this scenario by increasing the initrans
and/or maxtrans to allow multiple ITL slots and/or by increasing the
pctfree on the table. Finally, TM enqueues occur during DML to prevent
DDL to the affected object. If you have foreign keys, be sure to index
them to avoid this general locking issue.
Enqueue - ST Use LMT’s
or pre-allocate large extents
Enqueue - HW
Pre-allocate extents above HW (high water mark.)
Enqueue – TX Increase
initrans and/or maxtrans (TX4) on (transaction) the table or
index. Fix locking issues if TX6. Bitmap (TX4) &
Duplicates in Index (TX4).
Enqueue - TM Index
foreign keys; Check application (trans. mgmt.) locking of tables.
DML Locks.
7. Log Buffer Space
This wait occurs because you are writing the log
buffer
faster than LGWR can write it to the redo logs, or because log switches
are too slow. To address this problem, increase the size of the
redo
log
files,
or increase the size of the log buffer, or get faster disks to write
to. You might even consider using solid-state disks, for their high
speed.
8. Log File Switch
All commit requests are waiting for "logfile switch
(archiving needed)" or "logfile switch (chkpt. Incomplete)." Ensure
that the archive disk is not full or slow. DBWR may be too slow because
of I/O. You may need to add more or larger redo logs, and you may
potentially
need to add database writers if the DBWR is the problem.
9. Log File Sync
A Log File Sync happens each time a commit (or rollback) takes place.
If there are a lot of waits in this area then you may want to examine
your application to see if you are committing too frequently (or at
least more than you need to). When a user commits or rolls back data,
the LGWR
flushes
the session's redo from the log buffer to the redo logs. The log file
sync process must wait for this to successfully complete. To reduce
wait events here, try to commit more records (try to commit a batch of
50 instead of one at a time, for example). Put redo logs on a
faster
disk, or alternate redo logs on different physical disks, to reduce the
archiving effect on LGWR. Don't use RAID 5, since it is very slow for
applications that write a lot; potentially consider using file system
direct I/O or raw devices, which are very fast at writing information.
The associated event, ‘log buffer parallel write’ is used by the redo
log writer process, and it will indicate if your actual problem is with
the log file I/O. Large wait times for this event can also be caused by
having too few CPU resources available for the redolog writer process.
10. Idle Event. There are several idle wait events listed after the
output; you can ignore them. Idle events are generally listed at the
bottom of each section and include such things as SQL*Net message
to/from
client and other background-related timings. Idle events are listed in
the stats$idle_event table.
11. global
cache cr request: (OPS) This wait event shows the
amount of time that an instance has waited for a requested data block
for
a consistent read and the transferred block has not yet arrived at the
requesting instance. See Note 157766.1 'Sessions Wait Forever for
'global
cache cr request' Wait Event in OPS or RAC'. In some cases the 'global
cache cr request' wait event may be perfectly normal if large buffer
caches
are used and the same data is being accessed concurrently on multiple
instances.
In a perfectly tuned, non-OPS/RAC database, I/O wait events would be
the
top wait events but since we are avoiding I/O's with RAC and OPS the
'global
cache cr request' wait event often takes the place of I/O wait events.
12.
library cache pin Library cache latch
contention may be caused by not using bind variables. It is due to
excessive parsing of sql statement.
13. CPU time
This is not really a wait event (hence, the new name), but rather the
sum of
the CPU used by this session, or the amount of CPU time used during the
snapshot window. In a heavily loaded system, if the CPU time event is
the biggest event, that could point to some CPU-intensive processing
(for example, forcing the use of an index when a full scan should have
been used), which could be the cause of the bottleneck. When CPU Other
is a significant component of total Response Time the next step is to
find the SQL statements that access the most blocks. Block accesses are
also known as Buffer Gets and Logical I/Os. Statspack lists such SQL
statements in section SQL ordered by Gets.
14. DB File Parallel Read
This
Wait
Event
is used when Oracle performs in parallel reads from
multiple datafiles to non-contiguous buffers in memory (PGA or Buffer
Cache). This is done during recovery operations or when buffer
prefetching is being used as an optimization i.e. instead of performing
multiple single-block reads. If this wait is an important component of
Wait Time, follow the same guidelines as 'db file sequential read'.
15. PX qref latch Can
often mean that the Producers are producing data quicker than the
Consumers can consume it. Maybe we could increase
parallel_execution_message_size to try to eliminate some of these waits
or we might decrease the degree of parallelism. If the system workload
is high consider to decrease the degree of parallelism. If you have
DEFAULT parallelism on your object you can decrease the value of
PARALLEL_THREADS_PER_CPU. Have in mind DEFAULT degree =
PARALLEL_THREADS_PER_CPU * #CPU's
16. Log File Parallel Writes. Log
file
parallel
write
waits occur when waiting for writes of REDO records
to the REDO log files to complete. The wait occurs in log writer (LGWR)
as part of normal activity of copying records from the REDO log buffer
to the current online log. The actual wait time is the time taken for
all the outstanding I/O requests to complete. Even though the writes
may be issued in parallel, LGWR needs to wait for the last I/O to be on
disk before the parallel write is considered complete. Hence the wait
time depends on the time it takes the OS to complete all requests.
Waits for log file parallel writes can be identified by looking at the
"Top 5 Timed Events" or "Wait Events" section of the Statspack report.
Log file parallel write waits can be reduced by moving log files to the
faster disks and/or separate disks where there will be less contention.
17. SQL*Net more data to client
This means the instance is sending a lot of data to the client. You can
decrease this time by having the client bring back less data. Maybe the
application doesn't need to bring back as much data as it is.
18. SQL*Net message to client
The “SQL*Net message to client” Oracle metric indicates the server
(foreground process) is sending a message to the client, and it can be
used to identify throughput issues over a network, especially
distributed databases with slow database links. The SQL*Net more data
to client event happens when Oracle writes multiple data buffers (sized
per SDU) in a single logical network call.
19. enq: TX - row lock contention:
Oracle maintence data consistency with the help of locking mechanism.
When a particular row is being modified by the process, either through
Update/ Delete or Insert operation, oracle tries to acquire lock on
that row. Only when the process has acquired lock the process can
modify the row otherwise the process waits for the lock. This wait
situation triggers this event. The lock is released whenever a COMMIT
is issued by the process which has acquired lock for the row. Once the
lock is released, processes waiting on this event can acquire lock on
the row and perform DML operation
The SQL that is stored in the shared pool SQL area (Library cache)
is reported in this section in different ways:
. SQL ordered by Buffer Gets
. SQL ordered by Physical Reads
. SQL ordered by Executions
. SQL ordered by Parse Calls
- SQL ordered by Gets:
This section reports the contents of the SQL area ordered by the number
of buffer gets and can be used to identify the most CPU Heavy SQL.
- Many DBAs feel that if the data is already contained within the
buffer cache the query should be efficient. This could not be
further from the truth. Retrieving more data than needed, even
from the buffer cache, requires CPU cycles and interprocess IO.
Generally
speaking, the cost of physical I/O is not 10,000 times more
expensive.
It actually is in the neighborhood of 67 times and actually almost zero
if the data is stored in the UNIX buffer cache.
- The statements of interest are those with a large number of gets per
execution especially if the number of executions is high.
- High buffer gets generally correlates with heavy CPU usage
- SQL ordered by Reads:
This section reports the contents of the SQL area ordered by the number
of reads from the data files and can be used to identify SQL causing
IO bottlenecks which consume the following resources.
- CPU time needed to fetch unnecessary data.
- File IO resources to fetch unnecessary data.
- Buffer resources to hold unnecessary data.
- Additional CPU time to process the query once the data is retrieved
into the buffer.
- SQL ordered by Executions:
This section reports the contents of the SQL area ordered by the number
of query executions. It is primarily useful in identifying the most
frequently used SQL within the database so that they can be monitored
for efficiency. Generally speaking, a small performance increase
on a frequently used query provides greater gains than a moderate
performance
increase on an infrequently used query. Possible reasons for high Reads
per Exec are use of unselective indexes require large numbers of blocks
to be fetched where such blocks are not cached well in the buffer
cache, index fragmentation, large Clustering Factor in index etc.
- SQL ordered by Parse Calls (9i Only):
This section shows the number of times a statement was parsed as
compared to the number of times it was executed. One to one
parse/executions
may indicate that:
- Bind variables are not being used.
- On RDBMS version 8172 and higher the init.ora parameter
session_cached_cursors
was not set in the init.ora (100 is usually the suggested starting
value).
See enhancement bug 1589185 for an explanation of the change that
shifts
some of the load from the library cache to the user session cache.
- The shared pool may be too small and the parse is not being retained
long enough for multiple executions.
- cursor_sharing is set to exact (this should NOT be changed without
considerable testing on the part of the client).
Generate Execution Plan for given SQL statement
If you have identified one or more problematic SQL statement, you may want to check the execution plan. Remember the "Old Hash Value" from the report above (1279400914), then execute the scrip to generate the execution plan.
sqlplus perfstat/perfstat
SQL> @?/rdbms/admin/sprepsql.sql
Enter the Hash Value, in this example: 1279400914SQL Text
~~~~~~~~
create table test as select * from all_objects
Known Optimizer Plan(s) for this Old Hash Value
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Shows all known Optimizer Plans for this database instance, and the Snap Id's
they were first found in the shared pool. A Plan Hash Value will appear
multiple times if the cost has changed
-> ordered by Snap Id
First First Plan
Snap Id Snap Time Hash Value Cost
--------- --------------- ------------ ----------
6 14 Nov 04 11:26 1386862634 52
Plans in shared pool between Begin and End Snap Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Shows the Execution Plans found in the shared pool between the begin and end
snapshots specified. The values for Rows, Bytes and Cost shown below are those
which existed at the time the first-ever snapshot captured this plan - these
values often change over time, and so may not be indicative of current values
-> Rows indicates Cardinality, PHV is Plan Hash Value
-> ordered by Plan Hash Value
--------------------------------------------------------------------------------
| Operation | PHV/Object Name | Rows | Bytes| Cost |
--------------------------------------------------------------------------------
|CREATE TABLE STATEMENT |----- 1386862634 ----| | | 52 |
|LOAD AS SELECT | | | | |
| VIEW | | 1K| 216K| 44 |
| FILTER | | | | |
| HASH JOIN | | 1K| 151K| 38 |
| TABLE ACCESS FULL |USER$ | 29 | 464 | 2 |
| TABLE ACCESS FULL |OBJ$ | 3K| 249K| 35 |
| TABLE ACCESS BY INDEX ROWID |IND$ | 1 | 7 | 2 |
| INDEX UNIQUE SCAN |I_IND1 | 1 | | 1 |
| NESTED LOOPS | | 5 | 115 | 16 |
| INDEX RANGE SCAN |I_OBJAUTH1 | 1 | 10 | 2 |
| FIXED TABLE FULL |X$KZSRO | 5 | 65 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| FIXED TABLE FULL |X$KZSPR | 1 | 26 | 14 |
| VIEW | | 1 | 13 | 2 |
| FAST DUAL | | 1 | | 2 |
--------------------------------------------------------------------------------
The statistics section shows the overall database statistics.
These are the statistics that the summary information is derived
from.
A list of the statistics maintained by the RDBMS kernel can be found
in Appendix C of the Oracle Reference manual for the version being
utilized.
Instance Activity Stats for DB: PHS2 Instance: phs2 Snaps:
100 -104
Statistic
Total
per
Second
per Trans
--------------------------------- ---------------- ------------
------------
CPU used by this
session
84,161
23.4
3,825.5
CPU used when call
started
196,346
54.5
8,924.8
CR blocks
created
709
0.2
32.2
DBWR buffers
scanned
0
0.0
0.0
DBWR checkpoint buffers
written
245
0.1
11.1
DBWR
checkpoints
33
0.0
1.5
DBWR cross instance
writes
93
0.0
4.2
DBWR free buffers
found
0
0.0
0.0
....
Statistic
Total
per
Second
per Trans
---------------------------------
------------------ -------------- ----------
branch node
splits
7,162
0.1
0.0
consistent
gets
12,931,850,777
152,858.8
3,969.5
current blocks converted for
CR
75,709
0.9
0.0
db block
changes
343,632,442
4,061.9
105.5
db block
gets
390,323,754
4,613.8
119.8
hot buffers moved to head of
LRU
197,262,394
2,331.7 60.6
leaf node 90-10
splits
26,429
0.3
0.0
leaf node
splits
840,436
9.9
0.3
logons
cumulative
21,369
0.3
0.0
physical
reads
504,643,275
5,965.1
154.9
physical
writes
49,724,268
587.8
15.3
session logical
reads
13,322,170,917
157,472.5
4,089.4
sorts
(disk)
4,132
0.1
0.0
sorts
(memory)
7,938,085
93.8
2.4
sorts
(rows)
906,207,041
10,711.7
278.2
table fetch continued
row
25,506,365
301.5
7.8
table scans (long
tables)
111
0.0
0.0
table scans (short
tables)
1,543,085
18.2
0.5
Instance
Activity
Terminology
Statistic |
Description |
Session Logical Reads | All reads cached in memory. Includes both consistent gets and also the db block gets. |
Consistent Gets | These are the reads of a block that are in the cache. They are NOT to be confused with consistent read (cr) version of a block in the buffer cache (usually the current version is read). |
Db block gets | These are block gotten to be changed. MUST be the CURRENT block and not a cr block. |
Db block changes | These are the db block gets (above) that were actually changed. |
Physical Reads | Blocks not read from the cache. Either from disk, disk cache or O/S cache; there are also physical reads direct which bypass cache using Parallel Query (not in hit ratios). |
Of particular interest are the following statistics.
- CPU USED BY THIS SESSION, PARSE
TIME CPU or RECURSIVE CPU USAGE:
These numbers are useful to diagnose CPU saturation on the system
(usually
a query tuning issue). The formula to calculate the CPU usage breakdown
is:
Service (CPU) Time = other CPU + parse time CPU
Other CPU = "CPU used by this session" - parse time CPU
Some releases do not correctly store this data and can show huge
numbers. The rule to decide if you can use these metrics is:
Trustworthy if :
(db version>= 8.1.7.2 and 9.0.1)
OR ((db version >= 9.0.1.1) = 8.0.6.0 AND not using
job_queue_processes
AND CPU_PER_CALL = default)
recursive cpu usage = This component can be high if large amounts of PL/SQL are being processed. It is outside the scope of this document to go into detail with this, but you will need to identify your complete set of PL/SQL, including stored procedures, finding the ones with the highest CPU load and optimize these. If most work done in PL/SQL is procedural processing (rather than executing SQL), a high recursive cpu usage can actually indicate a potential tuning effort.
parse time cpu= Parsing SQL statements is a heavy operation, that should be avoided by reusing SQL statements as much as possible. In precompiler programs, unnecessary parting of implicit SQL statements can be avoided by increasing the cursor cache (MAXOPENCURSORS parameter) and by reusing cursors. In programs using Oracle Call Interface, you need to write the code, so that it re-executes (in stead of reparse) cursors with frequently executed SQL statements. The v$sql view contains PARSE_CALLS and EXECUTIONS columns, that can be used to identify SQL, that is parsed often or is only executed once per parse.
other cpu= The source of other cpu is primarily handling of buffers in the buffer cache. It can generally be assumed, that the CPU time spent by a SQL statement is approximately proportional to the number of buffer gets for that SQL statements, hence, you should identify and sort SQL statements by buffer gets in v$sql. In your statspack report, look at the part ‘SQL ordered by Gets for DB’. Start tuning SQL statements from the top of this list. In Oracle9i, the v$sql view contain a column, CPU_TIME, which directly shows the cpu time associated with executing the SQL statement.
- DBWR BUFFERS SCANNED:
the
number
of
buffers looked at when scanning the lru portion of the
buffer cache for dirty buffers to make
clean. Divide by "dbwr lru scans" to find the average number of buffers
scanned. This count includes both dirty and clean buffers. The average
buffers scanned may be different from the average scan depth due to
write
batches filling up before a scan is complete. Note that this includes
scans for reasons other than make free buffer requests.
- DBWR CHECKPOINTS: the number
of checkpoints messages that were sent to DBWR and not necessarily the
total number of actual checkpoints that took place. During a
checkpoint there is a slight decrease in performance since data blocks
are being written to disk and that causes I/O. If the number of
checkpoints is reduced, the performance of normal database operations
improve but recovery after instance failure
is slower.
- DBWR TIMEOUTS: the number of
timeouts when DBWR had been idle since the last timeout. These
are the times that DBWR looked for buffers to idle write.
- DIRTY BUFFERS INSPECTED: the
number of times a foreground encountered
a dirty buffer which had aged out through the lru queue, when
foreground
is looking for a buffer to reuse. This should be zero if DBWR is
keeping
up with foregrounds.
- FREE BUFFER INSPECTED: the
number of buffers skipped over from the end of the LRU queue in order
to find a free buffer. The difference
between this and "dirty buffers inspected" is the number of buffers
that
could not be used because they were busy or needed to be written after
rapid aging out. They may have a user, a waiter, or being read/written.
- RECURSIVE CALLS:
Recursive calls occur because of cache misses
and segment extension. In general if recursive calls is greater than
30 per process, the data dictionary cache should be optimized and
segments
should be rebuilt with storage clauses that have few large
extents.
Segments include tables, indexes, rollback segment, and temporary
segments.
NOTE: PL/SQL can generate extra recursive calls which may be
unavoidable.
- REDO BUFFER ALLOCATION RETRIES:
total number of retries necessary to allocate space in the redo
buffer. Retries are needed because
either the redo writer has gotten behind, or because an event
(such
as log switch) is occurring
- REDO LOG SPACE REQUESTS:
indicates
how
many
times a user process waited for space in the redo
log buffer. Try increasing the init.ora parameter LOG_BUFFER so
that zero Redo Log Space Requests
are made.
- REDO WASTAGE: Number of bytes
"wasted" because redo blocks needed
to be written before they are completely full. Early
writing
may be needed to commit transactions, to be able to write a database
buffer, or to switch logs
- SUMMED DIRTY QUEUE LENGTH:
the sum of the lruw queue length after every write request completes.
(divide by write requests to get average
queue length after write completion)
- TABLE FETCH BY ROWID: the
number of rows that were accessed by a rowid. This includes rows
that were accessed using an index and rows that were accessed using the
statement where rowid = 'xxxxxxxx.xxxx.xxxx'.
- TABLE FETCH BY CONTINUED ROW:
indicates the number of rows that are chained to another block. In some
cases (i.e. tables with long columns)
this is unavoidable, but the ANALYZE table command should be used to
further investigate the chaining, and where possible, should be
eliminated
by rebuilding the table.
- Table Scans (long tables) is
the total number of full table scans performed on tables with more than
5 database blocks. If the number of full table scans is high the
application should be tuned to effectively use Oracle indexes. Indexes,
if they exist, should be used
on long tables if less than 10-20% (depending on parameter settings and
CPU count) of the rows from the table are returned. If this is not the
case, check the db_file_multiblock_read_count parameter setting. It may
be too high. You may also need to tweak optimizer_index_caching
and optimizer_index_cost_adj.
- Table Scans (short tables) is
the number of full table scans performed
on tables with less than 5 database blocks. It is optimal to
perform
full table scans on short tables rather than using indexes.
IO Activity Input/Output (IO) statistics for the instance are listed
in the following sections/formats:
- Tablespace I/O Stats for DB: Ordered by total IO per tablespace.
- File I/O Stats for DB: Ordered alphabetically by tablespace, filename.
If the statistic "Buffer Waits" for a tablespace is greater than
1000, you may want to consider tablespace reorganization in order to
spread tables within it across another tablespaces.
In Oracle 8.1.7 many other columns were included as follow:
- Avg. Read / Second
- Avg. Blocks / Read
- Avg. Writes / Second
- Buffer Waits
- Avg. Buffer Waits / Milisecond
Note that Oracle considers average read times of greater than 20 ms
unacceptable. If a datafile consistently has average read times
of
20 ms or greater then:
- The queries against the contents of the owning tablespace
should be examined and tuned so that less data is retrieved.
- If the tablespace contains indexes, another option is to compress
the indexes so that they require less space and hence, less IO.
- The contents of that datafile should be redistributed across
several disks/logical volumes to more easily accommodate the load.
- If the disk layout seems optimal, check the disk controller
layout. It may be that the datafiles need to be distributed
across
more disk sets.
The buffer statistics are comprised of two sections:
- Buffer Pool Statistics: This section can have
multiple
entries if multiple buffer pools are allocated. A baseline of the
database's buffer
pool statistics should be available
to compare with the current statspack buffer pool statistics. A
change
in that pattern unaccounted for by a change in workload should be a
cause
for concern. Also check the Buffer
Pool Advisory to identify if increasing that parameter
(db_cache_size) would help to reduce Physical Reads.
- Buffer Wait Statistics: This section shows a breakdown
of each type of object waited for. This section follows the
Instance
Recovery Stats for DB.
Buffer wait Statistics for DB: PHS2 Instance: phs2 Snaps: 100 -104The above shows no real contention. Typically, when there is buffer contention, it is due to data block contention with large average wait times, like the example below:
-> ordered by wait time desc, waits desc
Tot Wait Avg
Class Waits Time (cs) Time (cs)
------------------ ----------- ---------- ---------
undo header 42 21 1
data block 18 6 0
Buffer wait Statistics for DB: GLOVP Instance: glovp Snaps: 454 - 455
Tot Wait Avg
Class Waits Time (cs) Time (cs)
------------------ ----------- ---------- ---------
data block 9,698 17,097 2
undo block 210 1,225 6
segment header 259 367 1
undo header 259 366 1
file header block 24 1 33
system undo header 1 0 0
Instance Recovery Statistics
This section was added in 9i and is useful for monitoring the
recovery and redo information
PGA Memory Statistics
This section was added in 9i and helps when using the new
model to allocate
PGA in Oracle9i using PGA_AGGREGATE_TARGET. This section is
particularly useful when monitoring session memory
usage on Windows servers
An enqueue is simply a locking mechanism. This section is
very
useful and must be used when the wait event "enqueue" is listed in the
"Top 5 timed events".
Enqueue activity for DB: S901 Instance: S901 Snaps: 2 -3The action to take depends on the lock type that is causing the most problems. The most common lock waits are generally for:
-> Enqueue stats gathered prior to 9i should not be compared with 9i data
-> ordered by waits desc, Waits desc
Avg Wt Wait
Eq Requests Succ Gets Failed Gets Waits Time (ms) Time (s)
-- ------------ ------------ ----------- ----------- ------------- ------------
TC 44,270 44,270 0 8,845 619.37 5,478
TX 13,072,864 13,072,809 0 4,518 641.72 2,899
CU 5,532,494 5,532,494 0 33,355 4.78 159
SQ 418,547 418,547 0 1,251 15.10 19
PS 5,950,717 5,189,366 761,354 69,381 .19 13
US 4,912 4,912 0 282 45.16 13
PR 8,325 8,325 0 11 213.64 2
CI 67,060 67,060 0 15 18.93 0
JD 165,560 165,560 0 1 261.00 0
HW 56,401 56,401 0 3 2.67 0
- TX (Transaction Lock): Generally due to application concurrency
mechanisms, or table setup issues. The TX lock is acquired when a
transaction initiates its first change and is held until the
transaction does a COMMIT or ROLLBACK. It is used mainly as a queuing
mechanism so that other resources can wait for a transaction to
complete.
- TM (DML enqueue): Generally due to application issues,
particularly if foreign key constraints have not been indexed. This
lock/enqueue is acquired when performing an insert, update, or delete
on a parent or child table.
- ST (Space management enqueue): Usually caused by too much space
management occurring.
For example: create table as select on large tables
on busy instances, small extent sizes, lots of sorting, etc. These
enqueues are caused if a lot of space management activity is occurring
on the database (such as small extent size, several sortings occurring
on the disk).
Undo information is provided in the following sections:
- Undo Segment Summary
- Undo Segment Stats
The examples below show typical performance problem related to Undo (rollback) segments:
- Undo Segment Summary for DB
Undo Segment Summary for DB: S901 Instance: S901 Snaps: 2 -3The description of the view V$UNDOSTAT in the Oracle9i Database Reference guide provides some insight as to the columns definitions. Should the client encounter SMU problems, monitoring this view every few minutes would provide more useful information.
-> Undo segment block stats:
-> uS - unexpired Stolen, uR - unexpired Released, uU - unexpired reUsed
-> eS - expired Stolen, eR - expired Released, eU - expired reUsed
Undo Undo Num Max Qry Max Tx Snapshot Out of uS/uR/uU/
TS# Blocks Trans Len (s) Concurcy Too Old Space eS/eR/eU
---- -------------- ---------- -------- ---------- -------- ------ -------------
1 20,284 1,964 8 12 0 0 0/0/0/0/0/0
- Undo Segment Stats for DB
Undo Segment Stats for DB: S901 Instance: S901 Snaps: 2 -3This section provides a more detailed look at the statistics in the previous section by listing the information as it appears in each snapshot.
-> ordered by Time desc
Undo Num Max Qry Max Tx Snap Out of uS/uR/uU/
End Time Blocks Trans Len (s) Concy Too Old Space eS/eR/eU
------------ ------------ -------- ------- -------- ------- ------ -------------
12-Mar 16:11 18,723 1,756 8 12 0 0 0/0/0/0/0/0
12-Mar 16:01 1,561 208 3 12 0 0 0/0/0/0/0/0
It should be noted that 9i introduces an optional init.ora parameter called UNDO_RETENTION which allows the DBA to specify how long the system will attempt to retain undo information for a committed transaction without being overwritten or recaptured. This parameter, based in units of wall-clock seconds, is defined universally for all undo segments.
Use of UNDO_RETENTION can potentially increase the size of the
undo
segment for a given period of time, so the retention period should not
be arbitrarily set too high. The UNDO tablespace still must be sized
appropriately.
The following calculation can be used to determine how much space a
given
undo segment will consume given a set value of UNDO_RETENTION.
Undo Segment Space Required = (undo_retention_time *
undo_blocks_per_seconds)
As an example, an UNDO_RETENTION of 5 minutes (default) with 50 undo blocks/second (8k blocksize) will generate:
Undo Segment Space Required = (300 seconds * 50 blocks/ seconds * 8K/block) = 120 M
The retention information (transaction commit time) is stored in every transaction table block and each extent map block. When the retention period has expired, SMON will be signaled to perform undo reclaims, done by scanning each transaction table for undo timestamps and deleting the information from the undo segment extent map. Only during extreme space constraint issues will retention period not be obeyed.
Latch information is provided in the following three sections:
. Latch Activity
. Latch Sleep breakdown
. Latch Miss Sources
This information should be checked whenever the "latch free" wait
event or other
latch wait events experience long waits. This section is particularly
useful for determining latch contention
on an instance. Latch contention generally indicates resource
contention
and supports indications of it in other sections.
Latch contention is indicated by a Pct Miss of greater than 1.0% or
a relatively high value in Avg Sleeps/Miss.
While each latch can indicate contention on some resource, the more
common latches to watch are:
cache buffer chain= The
cache
buffer chain latch protects the hash chain of cache buffers, and is
used for each access to cache buffers. Contention for this latch can
often only be reduced by reducing the amount of access to cache
buffers. Using the X$BH fixed table can identify if some hash chains
have many buffers associated with them. Often, a single hot block, such
as an index root block, can cause contention for this latch. In
Oracle9i, this is a shared latch, which minimizes contention for blocks
being read only. Contention on this latch confirms a hot
block issue.
shared pool= The shared pool
latch is heavily used during parsing, in particular during hard parse.
If your application is written so that it generally uses literals in
stead of bind variables, you will have high contention on this latch.
Contention on this latch in conjunction with reloads
in the SQL Area of the library cache section indicates that the shared
pool is too small. In release 8.1.6 and later, you can set the
cursor_sharing parameter in
init.ora to the value ‘force’ to reduce the hard parsing and reduce
some of the contention for the shared pool latch. Applications that are
coded to only parse once per cursor and execute multiple times will
almost completely avoid contention for the shared pool latch.
library cache= The library
cache latch is heavily used during both hard and soft parsing. If you
have high contention for this latch, your application should be
modified to avoid parsing if at all possible. Setting the
cursor_sharing parameter in init.ora to the value ‘force’ provides some
reduction in the library cache latch needs for hard parses, and setting
the session_cached_cursors sufficiently high provides some reduction in
the library cache latch needs for repeated soft parsing within a single
session. There is minor contention for this latch involved in executing
SQL statements, which can be reduced further by setting
cursor_space_for_time=true, if the application is properly written to
parse statements once and execute multiple times.
row cache= The row cache latch
protects the data dictionary information, such as information about
tables and columns. During hard parsing, this latch is used
extensively. In release 8.1.6 and above, the cursor_sharing parameter
can be used to completely avoid the row cache latch lookup during
parsing.
cache
buffer
lru
chain= The
buffer cache has a set of chains of LRU block, each protected by one of
these latches. Contention for this latch can often be reduced by
increasing the db_block_lru_latches parameter or by reducing the amount
of access to cachebuffers.
This section of the report shows information about the different
sub-areas activity
in the library cache.
Library Cache Activity for DB: S901 Instance: S901 Snaps: 2 -3
->"Pct Misses" should be very low
Get Pct Pin Pct Invali-
Namespace Requests Miss Requests Miss Reloads dations
--------------- ------------ ------ -------------- ------ ---------- --------
BODY 310,879 0.0 310,880 0.0 0 0
CLUSTER 1,009 0.3 1,007 0.6 0 0
INDEX 14,713 0.2 17,591 0.3 0 0
SQL AREA 14,184,204 0.1 313,089,592 1.9 5,793,355 38,421
TABLE/PROCEDURE 46,190,602 0.0 91,843,902 0.0 825 0
TRIGGER 148,809 0.0 148,809 0.0 2 0
Values in Pct Misses or Reloads in the SQL Area,
Tables/Procedures
or Trigger rows indicate that the shared
pool may be too small. To confirm this, consistent values (not
sporadic) in
Pct Misses or Reloads in the Index row indicate that the buffer cache
is
too small. (No longer available in 9i.)
Values in Invalidations in the SQL Area indicate that a table
definition
changed while a query was being run against it or a PL/SQL package
being
used was recompiled.
PGA
Aggr Section
Shared
Pool
Advisory
Section
Use this section to evaluate your shared pool size parameter.
P |
Size for Est (M) |
Size Factor |
Buffers for Estimate |
Est Phys Read Factor |
Estimated Physical Reads |
D |
72 |
0.10 |
8,622 |
4.35 |
2,937,669,401 |
D |
144 |
0.20 |
17,244 |
2.03 |
1,371,604,952 |
D |
216 |
0.30 |
25,866 |
1.54 |
1,038,346,123 |
D |
288 |
0.40 |
34,488 |
1.42 |
959,213,827 |
D |
360 |
0.49 |
43,110 |
1.33 |
899,213,101 |
D |
432 |
0.59 |
51,732 |
1.25 |
844,100,667 |
D |
504 |
0.69 |
60,354 |
1.17 |
793,922,143 |
D |
576 |
0.79 |
68,976 |
1.11 |
752,192,096 |
D |
648 |
0.89 |
77,598 |
1.06 |
716,041,579 |
D |
720 |
0.99 |
86,220 |
1.01 |
680,211,989 |
D |
728 |
1.00 |
87,178 |
1.00 |
676,024,586 |
D |
792 |
1.09 |
94,842 |
0.95 |
642,521,274 |
D |
864 |
1.19 |
103,464 |
0.89 |
603,505,502 |
D |
936 |
1.29 |
112,086 |
0.84 |
570,790,229 |
D |
1,008 |
1.38 |
120,708 |
0.80 |
543,401,299 |
D |
1,080 |
1.48 |
129,330 |
0.77 |
518,133,473 |
D |
1,152 |
1.58 |
137,952 |
0.73 |
491,055,525 |
D |
1,224 |
1.68 |
146,574 |
0.70 |
470,256,432 |
D |
1,296 |
1.78 |
155,196 |
0.66 |
447,139,441 |
D |
1,368 |
1.88 |
163,818 |
0.59 |
396,093,338 |
D |
1,440 |
1.98 |
172,440 |
0.51 |
347,627,804 |
The section provides estimates on, how the increase or descrease of buffer cache size will casue decrease or increase in physical reads. This information is just an estimated data and not an actual data.
Starting point here is “Size Factor” = 1.0. This gives current memory allocation for Buffer Cache. In this example, 728 MB is being allocated to buffer cache. With this setting the estimated amount of Physical Reads are 676,024,586. In case we increase the memory allocation for buffer cache to say 1440 MB (“Size Factor” = 1.98) then estimated physical reads will be 347,627,804. This means by allocating additional 712 MB for Buffer Cache, total estimated physical reads will come down by 328,396,782.
On the other hand, by reducing Buffer Cache to say 216 MB (“Size Factor” = 0.30) estimated physical reads increase to 1,038,346,123.
The statistics acts as an input to DBA in order to tune the Buffer Cache memory.
PGA Target Est (MB) |
Size Factr |
W/A MB Processed |
Estd Extra W/A MB Read/ Written to Disk |
Estd PGA Cache Hit % |
Estd PGA Overalloc Count |
384 |
0.13 |
812,978.60 |
178,424.72 |
82.00 |
3,834 |
768 |
0.25 |
812,978.60 |
172,475.22 |
82.00 |
3,763 |
1,536 |
0.50 |
812,978.60 |
30,048.28 |
96.00 |
0 |
2,304 |
0.75 |
812,978.60 |
25,473.80 |
97.00 |
0 |
3,072 |
1.00 |
812,978.60 |
25,473.80 |
97.00 |
0 |
3,686 |
1.20 |
812,978.60 |
10,796.70 |
99.00 |
0 |
4,301 |
1.40 |
812,978.60 |
10,796.70 |
99.00 |
0 |
4,915 |
1.60 |
812,978.60 |
10,796.70 |
99.00 |
0 |
5,530 |
1.80 |
812,978.60 |
10,796.70 |
99.00 |
0 |
6,144 |
2.00 |
812,978.60 |
10,796.70 |
99.00 |
0 |
9,216 |
3.00 |
812,978.60 |
10,796.70 |
99.00 |
0 |
12,288 |
4.00 |
812,978.60 |
10,796.70 |
99.00 |
0 |
18,432 |
6.00 |
812,978.60 |
10,796.70 |
99.00 |
0 |
24,576 |
8.00 |
812,978.60 |
10,796.70 |
99.00 |
0 |
Similar to Buffer Pool Advisory, the statistic provides information on how the increase or decrease in PGA memory will cause increase or decrease in Estd PGA Cahce Hit %.
Starting point here is “Size Factor” = 1.0. This gives current memory allocation for PGA. In this example 3072 MB is being allocated to PGA. With this allocation the Estd PGA Cahce Hit % is 97, which is good. Hence even if we increase PGA to 3686 MB we will get 2% increase in Estd PGA Cahce Hit %. Hence it won’t be advisable to increase PGA further.
Shared Pool Size(M) |
SP Size Factr |
Est LC Size (M) |
Est LC Mem Obj |
Est LC Time Saved (s) |
Est LC Time Saved Factr |
Est LC Load Time (s) |
Est LC Load Time Factr |
Est LC Mem Obj Hits |
252 |
0.90 |
28 |
6,586 |
6,921,060 |
0.93 |
614,333 |
10.64 |
58,238,179 |
280 |
1.00 |
55 |
7,434 |
7,477,642 |
1.00 |
57,751 |
1.00 |
58,323,146 |
308 |
1.10 |
82 |
8,379 |
8,020,189 |
1.07 |
1 |
0.00 |
58,403,382 |
336 |
1.20 |
109 |
9,352 |
8,461,984 |
1.13 |
1 |
0.00 |
58,459,714 |
364 |
1.30 |
136 |
10,362 |
8,790,828 |
1.18 |
1 |
0.00 |
58,496,766 |
392 |
1.40 |
163 |
11,293 |
9,040,418 |
1.21 |
1 |
0.00 |
58,523,514 |
420 |
1.50 |
190 |
12,185 |
9,242,662 |
1.24 |
1 |
0.00 |
58,544,538 |
448 |
1.60 |
217 |
13,193 |
9,414,811 |
1.26 |
1 |
0.00 |
58,561,861 |
476 |
1.70 |
244 |
14,377 |
9,567,814 |
1.28 |
1 |
0.00 |
58,576,296 |
504 |
1.80 |
271 |
15,293 |
9,703,054 |
1.30 |
1 |
0.00 |
58,588,125 |
532 |
1.90 |
298 |
16,196 |
9,814,500 |
1.31 |
1 |
0.00 |
58,597,966 |
560 |
2.00 |
325 |
17,328 |
9,907,118 |
1.32 |
1 |
0.00 |
58,606,158 |
Similar to Buffer Pool Advisory and PGA, the statistic provides information on how the increase or decrease in Shared pool memory will cause increase or decrease in Estd LC Load Time (s).
Starting point here is “SP Size Factor” = 1.0. This gives current memory allocation for shared pool. In this example 280 MB is being allocated to shared pool. With this allocation the Estd LC Load Time (s) is 57,751. If we increase the shared pool size to 308 then Estd LC Load Time (s) will come down to value 1. Hence shared pool should be set to 308 MB.
SGA Target Size (M) |
SGA Size Factor |
Est DB Time (s) |
Est Physical Reads |
512 |
0.50 |
3,062,724 |
1,038,371,906 |
768 |
0.75 |
2,689,285 |
844,082,787 |
1,024 |
1.00 |
2,417,105 |
676,023,376 |
1,280 |
1.25 |
2,246,714 |
570,766,536 |
1,536 |
1.50 |
2,084,044 |
470,241,860 |
1,792 |
1.75 |
1,885,603 |
347,611,220 |
2,048 |
2.00 |
1,885,608 |
347,611,220 |
The statistic provides information on how the increase or decrease in SGA memory will cause decrease or increase in Estd Physical Reads.
Starting point here is “SGA Size Factor” = 1.0. This gives current memory allocation for SGA. In this example 1024 MB is being allocated to SGA. With this allocation the Estd Physical Reads is 844,082,787. If we increase the SGA size by 1024 MB i.e. to 2048 MB then Estd Physical Reads will come down by 328,412,156. Since there is 50% reduction in Estd Physical Reads, the SGA should be increased to 2048 MB.
Undo TS# |
Num Undo Blocks (K) |
Number of Transactions |
Max Qry Len (s) |
Max Tx Concurcy |
Min/Max TR (mins) |
STO/ OOS |
uS/uR/uU/ eS/eR/eU |
16 |
176.25 |
233,218 |
135 |
45 |
15/15.25 |
0/0 |
0/0/0/0/0/0 |
The statistic proivides information about UNDO segments.
· Min/MAX TR (mins) – Represents Minimum and Maximum Tuned Retention Minutes for Undo data. This data will help to set the UNDO_RETENTION database parameter. In this example this parameter can be set to 15.25 minutes i.e. 915 seconds.
· Max Qry Len(s) – Represents Maximum query length in seconds. In this example the max query length is 135 seconds.
· STO/ OOS – Represents count for Sanpshot Too Old and Out Of Space errors, occurred during the snapshot period. In this example, we can see 0 errors occurred during this period.
Owner |
Tablespace Name |
Object Name |
Subobject Name |
Obj. Type |
Logical Reads |
%Total |
RRA_OWNER |
RRA_INDEX |
RRA_REPBAL_IX_03 |
|
INDEX |
817,720,480 |
73.37 |
RRA_OWNER |
RRA_DATA |
TRRA_BALANCE_STATUS |
|
TABLE |
68,558,256 |
6.15 |
RRA_OWNER |
RRA_DATA |
TRRA_TRANSACTION_STATUS |
|
TABLE |
58,728,272 |
5.27 |
RRA_OWNER |
RRA_INDEX |
RRA_MSGOUT_IX_05 |
|
INDEX |
17,506,640 |
1.57 |
RRA_OWNER |
RRA_DATA |
TRRA_SENT_REGISTER_FILENAMES |
|
TABLE |
6,542,848 |
0.59 |
The statistic displays segment details based on logical reads happened. Data displayed is sorted on “Logical Reads” column in descending order. It provides information about segments for which more logical reads are happening. Most of these SQLs can be found under section SQL Statistics -> SQL ordered by Gets.
Owner |
Tablespace Name |
Object Name |
Subobject Name |
Obj. Type |
Physical Reads |
%Total |
RRA_OWNER |
RRA_DATA |
TRRA_BALANCE_STATUS |
|
TABLE |
2,112,621 |
32.09 |
RRA_OWNER |
RRA_INDEX |
RRA_PROCSTATUS_IX_03 |
|
INDEX |
1,649,590 |
25.05 |
RRA_OWNER |
RRA_INDEX |
RRA_BALSTAT_IX_03 |
|
INDEX |
776,458 |
11.79 |
RRA_OWNER |
RRA_DATA |
TRRA_TRANSACTIONS |
|
TABLE |
381,302 |
5.79 |
RRA_OWNER |
RRA_INDEX |
RRA_PROCSTATUS_IX_01 |
|
INDEX |
259,326 |
3.94 |
The statistic displays segment details based on physical reads happened. Data displayed is sorted on “Physical Reads” column in descending order. It provides information about segments for which more physical reads are happening.
Queries using these segments should be analysed to check whether any FTS is happening on these segments. In case FTS is happening then proper indexes should be created to eliminate FTS. Most of these SQLs can be found under section SQL Statistics -> SQL ordered by Reads.
Owner |
Tablespace Name |
Object Name |
Subobject Name |
Obj. Type |
Row Lock Waits |
% of Capture |
RRA_OWNER |
RRA_DATA |
RRA_SRF_IX_04 |
|
INDEX |
6,907 |
20.18 |
RRA_OWNER |
RRA_INDEX |
RRA_PROCSTATUS_IX_03 |
|
INDEX |
3,918 |
11.45 |
RRA_OWNER |
RRA_INDEX |
RRA_REPTRN_PK |
|
INDEX |
3,118 |
9.11 |
RRA_OWNER |
RRA_DATA |
TRRA_BALANCE_STATUS |
|
TABLE |
1,750 |
5.11 |
RRA_OWNER |
RRA_INDEX |
RRA_PROCSTATUS_IX_02 |
|
INDEX |
1,178 |
3.44 |
The statistic displays segment details based on total “Row lock waits” which happened during snapshot period. Data displayed is sorted on “Row Lock Waits” column in descending order. It provides information about segments for which more database locking is happening.
DML statements using these segments should be analysed further to check the possibility of reducing concurrency due to row locking.
Owner |
Tablespace Name |
Object Name |
Subobject Name |
Obj. Type |
ITL Waits |
% of Capture |
RRA_OWNER |
RRA_INDEX |
RRA_MSGBLOBS_IX_01 |
|
INDEX |
23 |
27.06 |
RRA_OWNER |
RRA_INDEX |
RRA_TRN_IX_08 |
|
INDEX |
15 |
17.65 |
RRA_OWNER |
RRA_INDEX |
RRA_INT_CLOB_IX_01 |
|
INDEX |
10 |
11.76 |
RRA_OWNER |
RRA_INDEX |
RRA_TRN_IX_05 |
|
INDEX |
10 |
11.76 |
RRA_OWNER |
RRA_INDEX |
RRA_TRN_IX_10 |
|
INDEX |
8 |
9.41 |
Whenver a transaction modifies segment block, it first add transaction id in the Internal Transaction List table of the block. Size of this table is a block level configurable parameter. Based on the value of this parameter those many ITL slots are created in each block.
ITL wait happens in case total trasactions trying to update same block at the same time are greater than the ITL parameter value.
Total waits happening in the example are very less, 23 is the Max one. Hence it is not recommended to increase the ITL parameter value.
Owner |
Tablespace Name |
Object Name |
Subobject Name |
Obj. Type |
Buffer Busy Waits |
% of Capture |
RRA_OWNER |
RRA_INDEX |
RRA_REPTRN_PK |
|
INDEX |
4,577 |
26.69 |
RRA_OWNER |
RRA_INDEX |
RRA_REPBAL_IX_03 |
|
INDEX |
1,824 |
10.64 |
RRA_OWNER |
RRA_INDEX |
RRA_PROCSTATUS_IX_03 |
|
INDEX |
1,715 |
10.00 |
RRA_OWNER |
RRA_INDEX |
RRA_MSGINB_PK |
|
INDEX |
827 |
4.82 |
RRA_OWNER |
RRA_INDEX |
RRA_PROCSTATUS_PK |
|
INDEX |
696 |
4.06 |
Buffer busy waits happen when more than one transaction tries to access same block at the same time. In this scenario, the first transaction which acquires lock on the block will able to proceed further whereas other transaction waits for the first transaction to finish.
If there are more than one instances of a process continuously polling database by executing same SQL (to check if there are any records available for processing), same block is read concurrently by all the instances of a process and this result in Buffer Busy wait event.
Namespace |
Get Requests |
Pct Miss |
Pin Requests |
Pct Miss |
Reloads |
Invali- dations |
BODY |
17 |
35.29 |
1,260,957 |
0.38 |
1,875 |
0 |
CLUSTER |
14,335 |
1.10 |
41,182 |
0.49 |
43 |
0 |
INDEX |
113 |
68.14 |
340 |
37.94 |
35 |
0 |
SQL AREA |
846,225 |
3.54 |
14,397,753 |
26.05 |
3,143,888 |
3,028,117 |
TABLE/PROCEDURE |
314,825 |
0.64 |
3,419,828 |
13.37 |
256,865 |
0 |
TRIGGER |
4 |
0.00 |
200,436 |
0.69 |
1,085 |
0 |
“Pct Misses” should be very low. In the example Pct Misses are above 10 for some of the library cache components. This indicates that the shared pool is not sufficiently sized. In case AMM (Automatic Memory Management) is used then the DBA can increase the SGA component. In case AMM is not used then increase the SHARED_POOL memory component.