All Wait Events


IO Waits Contents
  1. 1 Standard I/O
    1. 1.1 db file sequential read
    2. 1.2 db file scattered read
    3. 1.3 db file parallel read
    4. 1.4 read by other session
  2. 2 Direct I/O
    1. 2.1 direct path read 
    2. 2.2 direct path write
    3. 2.3 direct path read temp
    4. 2.4 direct path write temp
  3. 3 Analyzing IO
    1. 3.1 File
    2. 3.2 Object
    3. 3.3 Object Advanced
    4. 3.4 Statement Basic
    5. 3.5  Statement Advanced
    6. 3.6 Missing  Object Names 
  4. 4 Tuning IO
    1. 4.1 Average IO times
    2. 4.2 Buffer Cache Hit Ratio
    3. 4.3 TOP SQL
  5. 5 direct path read
    1. 5.1 Common Causes, Diagnosis, and Actions
      1. 5.1.1 Session-Level Diagnosis
    2. 5.2 Initialization Parameters of Interest

Network Waits Contents
  1. 1 Introduction
  2. 2 SQL*Net Wait Events
    1. 2.1 SQL*Net message from client
    2. 2.2 SQL*Net message to client 
    3. 2.3 SQL*Net more data to client 
    4. 2.4 SQL*Net more data from client
    5. 2.5 SQL*Net break/reset to client
    6. 2.6 DBLINK SQL*Net Waits
  3. 3 Analysis and Tuning
    1. 3.1 SDU
    2. 3.2 RECV_BUF_SIZE and SEND_BUF_SIZE


Redo Log Waits Contents
  1. 1 Redo
  2. 2 Redo Log Wait Events
    1. 2.1 Log file Sync
      1. 2.1.1 Commit less 
      2. 2.1.2 Improve IO Speed
    2. 2.2 log buffer space
    3. 2.3 log file switch (archiving needed)
    4. 2.4 log file switch (checkpoint incomplete) 
    5. 2.5 log file switch (private strand flush incomplete)
    6. 2.6 log file switch completion 
    7. 2.7 switch logfile command
  3. 3 Redo Log Sizing Concerns and Considerations


Buffer Busy Waits Contents




IO Waits

Standard I/O





db file sequential
 - read Single block read
db file scattered - read Multi block read
db file parallel - read Non-contiguous multi block read
read by other session - wait for another session to do the io

select parameter1, parameter2, parameter3 from v$event_name where ... 

NAME                             P1        P2      P3 
----------------------- ----------- --------- -------- 
db file sequential read       file#    block# blocks 
db file scattered read        file#    block# blocks 
db file parallel read         files    blocks requests 
read by other session         file#    block# class#

db file sequential read


  
  • Top reported wait
  • Single block read
  • block accessed via
    •  index 
    •  rowid
    •  rollback 
Example

select * from emp where empno=99;

where there is an index on emp(empno)

algorythm
search buffer cache for block by rowid, 
if fail, read block off  disk via file# and block#

Note: "sequential" means a sequence as in rowid




db file scattered read

  
  • Multi Block Read
    •  Full Table Scan 
    •  Index Fast Full Scan
Example

select * from emp;

algorythm
search buffer cache for block by rowid, 
if fail, read block off  disk via file# and block#  and # of blocks

Note: "scattered" means blocks are scattered throughout the buffer cache (even though they are laid out sequential on disk)

db_file_multiblock_read_count set the target block read size

db file parallel read

  
Process issues multiple single block reads in parallel 
  • Documentation says only for recovery 
  • seems to happen for normal read ops 
  • Async Call – wait for all reads to complete 
Examples
  •      Not contiguous multi block read 
  •      Index full or range scan 
  •      Where values in
Example

Select * from emp where empno in (1,2,3,4,5,6);




read by other session


   Multiple sessions reading  the same data that requires IO

Example

 two users doing a full table scan at the same time


algorythm
  • Block not found in cache
  • Read block from disk
  • Found other session already reading block from disk
  • Wait for the other session to finish IO




Direct I/O




This mechanism lets the client bypass the buffer cache for I/O intensive operations. 
The disk blocks are written into and read from process private memory.

select parameter1, parameter2, parameter3 from v$event_name 

NAME                    P1                 P2       P3
----------------------- ----------- --------- -------- 
direct path read        file number first dba block cnt 
direct path read temp   file number first dba block cnt 
direct path write       file number first dba block cnt 
direct path write temp  file number first dba block cnt

Direct I/O read and write size are controlled by the parameters

PARAMETER                  INSTANCE_VALUE
_smm_auto_min_io_size               56
_smm_auto_max_io_size               248



direct path read 

  
parallel query

(or using _serial_direct_read = true)


further reading:
http://afatkulin.blogspot.com/2009/01/11g-adaptive-direct-path-reads-what-is.html

http://antognini.ch/2009/10/hints-for-direct-path-insert-statements/

http://dioncho.wordpress.com/2009/07/21/disabling-direct-path-read-for-the-serial-full-table-scan-11g/


Grouping by Query Coordinator:

select
       ash.SQL_ID,
       QC_SESSION_ID qsid,
       count(*) cnt,
       count (distinct session_id) deg,
       nvl(o.object_name,to_char(CURRENT_OBJ#))  obj,
       o.object_type otype,
       decode(session_state, 'WAITING',event,'CPU') event
from   v$active_session_history ash,
        all_objects o
where  o.object_id (+)= ash.CURRENT_OBJ#
   and qc_session_id is not null
group by qc_session_id, sql_id, o.object_name,
         o.object_type, CURRENT_OBJ#, event, session_state
Order by qc_session_id, sql_id

SQL_ID        QSID  CNT DEG OBJ          OTYPE  EVENT
------------- ---- ---- --- ------------ ----- --------------------------
7p3jt75phub2d  144  386   4 WAIT_OBJECTS TABLE  PX Deq Credit: send blkd
               144    4   3 WAIT_OBJECTS TABLE  PX qref latch
               144   37   1 WAIT_OBJECTS TABLE  db file sequential read
               144    3   2 WAIT_OBJECTS TABLE  direct path read
               144   70   1 WAIT_OBJECTS TABLE  CPU
               144   21   4 0                   PX Deq Credit: send blkd
               144   12   4 0                   db file sequential read



direct path write

  
sqlldr 
loading LOBs 
parallel DMLs 
create table as select 
create index

Seems to be a bug on 10g where direct path write waits are incorrectly recorded as CPU


 (from metalink)
"If asynchronous IO is not being used then the IO write request blocks until completed but this dies not show as a wait at the time the IO is issued. The session returns later to pick up the completed IO data but can then show a wait on "direct path write" even though this wait will return immediately. "

11g : There is hidden parameter _direct_path_insert_features, which seems controls direct path behaviour. alter session set "_direct_path_insert_features" = 1 turns off direct path for insert .. values().

further reading:
http://afatkulin.blogspot.com/2009/01/insert-append-into-values.html




direct path read temp

direct path write temp

  
This event happens when sorting overflows the memory buffers and has to be written to disk.
If it's a problem consider increasing paramater 

pga_aggregate_target

The explain plan can give estimated sort size requirements and extended row source stats in the explain plan can give the actual sort size usage of the sql statement


select  s.sid, --s.osuser, s. process, 
        s.sql_id, tmp.segtype, 
       ((tmp.blocks*8)/1024)MB, tmp.tablespace
from  
       v$tempseg_usage tmp,  
       v$session s
where tmp.session_num=s.serial#
and segtype in ('HASH','SORT')
order by blocks desc

SID SQL_ID        SEGTYPE MB TABLESPACE 
--- ------------- ------- -- ----------
 90 gm5s0vc0vzppc SORT     2 TEMP03
105 gm5s0vc0vzppc SORT     2 TEMP03
 80 gm5s0vc0vzppc SORT     2 TEMP03
105 gm5s0vc0vzppc HASH     2 TEMP03
102 gm5s0vc0vzppc SORT     2 TEMP03
 90 gm5s0vc0vzppc HASH     2 TEMP03


The following information from Jonathan Lewis,  Randolf Geist and Joze Senegacnik :

On Oracle starting in 10.2.0.3  to 11.1.0.6, this doesn't work:

alter session set workarea_size_policy = manual;
alter session set sort_area_size = 104857600;


broke in 10.2.0.3 
fixed in  11.1.0.7

Workaround for serial connections, just run the command twice:

alter session set sort_area_size = 104857600;
alter session set sort_area_size = 104857600;

But this doesn't work for parallel sessions that don't pick up the manual size at all but use the system default size (64K).You can change the system default, but this will be a one size fits all workaround.
Does not seem to affect HASH_AREA_SIZE. hash_area_size parameter is an integer in bytes so it is might be limited to approx 2G


On 9i the upper limit for sort area size using PGA_AGGREGATE_TARGET for a single session was fixed to 100M unless using undocumented parameters:
_pga_max_size: Maximum PGA size for a single process
               Defaults to 200M 
_smm_max_size: Maximum workarea size for one process
               Defaults 
               Min(0.05*pga_aggregate_target,0.5*_pga_max_size,100M)
               I believe this parameter is in KILOBYTES, from testing, see also comments on 
               http://dba-blog.blogspot.com/2005/08/pgamaxsize-hidden-parameter.html

In 10gR2 the 100M does not apply anymore when using a PGA_AGGREGATE_TARGET > 1000M. 
In 10.2 it is limited to 200M as long as pga_aggregate_target is smaller than 1GB.

When pga_aggregate_target is large than  1GB then _pga_max_size= 20% of pga_aggregate_target .
The pga_aggregate_target now drives in most cases _smm_max_size:
pga_aggregate_target <=500MB, _smm_max_size = 20%* pga_aggregate_target
pga_aggregate_target between 500MB and 1000MB, _smm_max_size = 100MB
pga_aggregate_target >1000MB, _smm_max_size = 10%* pga_aggregate_target
and _smm_max_size in turns now drives _pga_max_size: _pga_max_size = 2 * _smm_max_size
A pga_aggregate_target larger than 1000MB will now allow much higher default thresholds in 10gR2: pga_aggregate_target set to 5GB will allow an _smm_max_size of 500MB (was 100MB before) and _pga_max_size of 1000MB (was 200MB).





Analyzing IO



File

List IO by  FILE# 

set linesize 100
col
 block_type for a18

col objn for a25

col otype for a15

col event for a25

col blockn for 999999

col p1 for 9999

col aas for 999.99

col f_minutes new_value v_minutes

select &minutes f_minutes from dual;

select io.cnt, round(io.cnt/(&v_minutes*60),2) aas,

       io.event, io.p1 p1, f.tablespace_name

from (select count(*) cnt, substr(event,0,25) event, ash.p1 p1

       from v$active_session_history ash

       where ( event like 'db file s%' or event like 'direct%' )

          and sample_time > sysdate - &v_minutes/(60*24)

       group by event , ash.p1

      ) io,

      dba_data_files f

where f.file_id = io.p1

Order by io.cnt;


 CNT AAS EVENT                   P1 TABLESPACE
---- --- ----------------------- --- ----------
   1 .00 db file sequential read   1 SYSTEM
   2 .00 db file sequential read   3 SYSAUX
  38 .06 db file sequential read   6 SOE
 179 .30 db file sequential read   7 SOEINDEX

Object

select count(*) cnt, CURRENT_OBJ#||' '||o.object_name objn,
       o.object_type otype
from v$active_session_history ash, all_objects o
where ( event like 'db file s%' or event like 'direct%' )
   and o.object_id (+)= ash.CURRENT_OBJ#
   and sample_time > sysdate - &minutes/(60*24)
   and session_state='WAITING'
group by CURRENT_OBJ#, o.object_name , o.object_type 
Order by count(*);


 CNT     AAS OBJN                     OTYPE
---- ------- ------------------------- ---------------
  79     .00 52949 ORDER_ITEMS         TABLE PARTITION
  97     .00 -1
 130     .00 53117 ORD_STATUS_IX       INDEX
 498     .01 53120 CUST_EMAIL_IX       INDEX
 512     .01 0
1632     .03 53112 ITEM_ORDER_IX       INDEX


Object Advanced

col block_type for a18
col obj for a20
col otype for a15
col event for a15
col blockn for 999999
col f_minutes new_value v_minutes
col p1 for 9999
col tablespace_name for a15
select &minutes f_minutes from dual;
select io.cnt cnt, io.aas aas, io.event event,
       substr(io.obj,1,20) obj, io.p1 p1, f.tablespace_name tablespace_name
from (select count(*) cnt, round(count(*)/(&v_minutes*60),2) aas,
        substr(event,0,15) event, 
        nvl(o.object_name,decode(CURRENT_OBJ#,-1,0,CURRENT_OBJ#)) obj,
        ash.p1,  o.object_type otype
       from v$active_session_history ash, all_objects o
       where ( event like 'db file s%' or event like 'direct%' )
         and o.object_id (+)= ash.CURRENT_OBJ#
         and sample_time > sysdate - &v_minutes/(60*24)
      group by  substr(event,0,15) ,  CURRENT_OBJ#, o.object_name , o.object_type , ash.p1
     ) io,
    dba_data_files f
where f.file_id = io.p1
Order by io.cnt;



CNT AAS EVENT           OBJ                P1 TABLESPACE_NAME
--- --- --------------- ----------------- --- ---------------
  1 .02 db file sequent ORDER_ITEMS_PK      7 SOEINDEX
  1 .02 db file sequent 0                   3 SYSAUX
  2 .03 db file scatter WAIT_OBJECTS        1 SYSTEM
  2 .03 db file sequent ORDER_ITEMS         6 SOE
  3 .05 db file sequent CUST_EMAIL_IX       7 SOEINDEX
  4 .07 db file sequent CUST_LNAME_IX       7 SOEINDEX
  6 .10 db file sequent ORD_ORDER_DATE_IX   7 SOEINDEX
 13 .22 db file sequent ITEM_ORDER_IX       7 SOEINDEX
 25 .42 db file sequent 0                   2 UNDOTBS1


Statement Basic

DEF v_minutes=60
col block_type for a18
col obj for a20
col objn for 999999
col otype for a15
col event for a15
col blockn for 999999
col p1 for 9999
col tablespace_name for a15
col f_minutes new_value v_minutes
select &minutes f_minutes from dual;
break on sql_id on tcnt
select sum(cnt) over ( partition by io.sql_id order by sql_id ) tcnt, io.sql_id,
          io.cnt cnt, io.aas aas, --io.event event, io.objn objn, io.obj obj,
          io.p1 p1, f.tablespace_name tablespace_name
from (select sql_id, count(*) cnt, round(count(*)/(&v_minutes*60),2) aas,
                  CURRENT_OBJ# objn, nvl(o.object_name,decode(CURRENT_OBJ#,-1,0,CURRENT_OBJ#)) obj,
                  o.object_type otype, ash.p1
           from v$active_session_history ash ,all_objects o
           where ( event like 'db file s%' or event like 'direct%' )
              and o.object_id (+)= ash.CURRENT_OBJ#
              and sample_time > sysdate - &v_minutes/(60*24)
           group by CURRENT_OBJ#, o.object_name , o.object_type , ash.p1, sql_id
        ) io,
      dba_data_files f
where f.file_id = io.p1
Order by tcnt, io.sql_id, io.cnt;


TCNT SQL_ID        CNT AAS  OBJN OBJ             P1 TABLESPAC
---- ------------- --- --- ----- --------------- -- ----------
  30 0yas01u2p9ch4   1 .01 53113 ITEM_PRODUCT_IX  7 SOEINDEX
                     1 .01 53079 ORDER_ITEMS_UK   7 SOEINDEX 
                    28 .16 53112 ITEM_ORDER_IX    7 SOEINDEX
  58 6v6gm0fd1rgrz   4 .02 54533 WAIT_OBJECTS     1 SYSTEM
                    54 .30     0 0                2 UNDOTBS1


 Statement Advanced


DEF v_minutes=60
col tcnt for 9999
col aas for 999.99
-- col sql_id for
col cnt for 999
col pct for 999
col obj for a20
col sub_obj for a10
col otype for a15
col event for a15
col file# for 9999
col tablespace_name for a15
col f_minutes new_value v_minutes
select &minutes f_minutes from dual;
break on sql_id on aas on tcnt
select
       --sum(cnt) over ( partition by io.sql_id order by sql_id ) tcnt,
       round(sum(cnt) over ( partition by io.sql_id order by sql_id ) / (&v_minutes*60),2) aas,
       io.sql_id,
       --io.cnt cnt,
       100*cnt/sum(cnt) over ( partition by io.sql_id order by sql_id ) pct,
       --CURRENT_OBJ#  obj#,
       o.object_name obj,
       o.subobject_name sub_obj,
       o.object_type otype,
       substr(io.event,8,10) event,
       io.p1 file#,
       f.tablespace_name tablespace_name,
       tbs.contents
from
(
  select
        sql_id,
  event,
        count(*) cnt,
        count(*) / (&v_minutes*60) aas,
        CURRENT_OBJ# ,
        ash.p1
   from v$active_session_history ash
   where ( event like 'db file s%' or event like 'direct%' )
      and sample_time > sysdate - &v_minutes/(60*24)
   group by
       CURRENT_OBJ#,
       event,
       --o.object_name ,
       --o.object_type ,
       ash.p1,
       sql_id
) io,
   dba_data_files f
   ,all_objects o
   , dba_tablespaces tbs
where
   f.file_id = io.p1
   and o.object_id (+)= io.CURRENT_OBJ#
   and tbs.tablespace_name= f.tablespace_name
Order by
        --tcnt,
          sql_id, cnt
/
clear breaks


      AAS SQL_ID        PCT OBJ          SUB_OBJ OTYPE      EVENT      F# TABLESPAC CONTENTS
---- ----------------- ----------- ------- ---------- ---------- -- --------- ---------
.00 f9u2k84v884y7  33 CUSTOMERS    SYS_P27 TABLE PART  sequentia  1 SYSTEM    PERMANENT    
                   33 ORDER_PK             INDEX       sequentia  4 USERS     PERMANENT
                   33                                  sequentia  2 UNDOTBS1  UNDO
.01 0tvtamt770hqz 100 TOTO1                TABLE       scattered  7 NO_ASSM   PERMANENT
.06 75621g9y3xmvd   3 CUSTOMERS    SYS_P36 TABLE PART  sequentia  4 USERS     PERMANENT
                    3 CUSTOMERS    SYS_P25 TABLE PART  sequentia  4 USERS     PERMANENT
                    3 CUSTOMERS    SYS_P22 TABLE PART  sequentia  4 USERS     PERMANENT
                    3 CUSTOMERS    SYS_P29 TABLE PART  sequentia  4 USERS     PERMANENT
                    3 CUSTOMERS    SYS_P21 TABLE PART  sequentia  4 USERS     PERMANENT
                    3 CUSTOMERS    SYS_P35 TABLE PART  sequentia  4 USERS     PERMANENT
                    3 CUSTOMERS    SYS_P32 TABLE PART  sequentia  4 USERS     PERMANENT
                    3 CUSTOMERS    SYS_P27 TABLE PART  sequentia  4 USERS     PERMANENT
                    3 CUSTOMERS    SYS_P34 TABLE PART  sequentia  4 USERS     PERMANENT
                    4 CUSTOMERS    SYS_P23 TABLE PART  sequentia  4 USERS     PERMANENT
                    5 CUSTOMERS    SYS_P33 TABLE PART  sequentia  4 USERS     PERMANENT
               50 CUSTOMERS_PK        INDEX       sequentia  4 USERS 

Missing  Object Names 


Notice that in some of the analysis the object name or object id is missing? Sometimes the OBJECT ID is -1 or 0 in the ASH data for I/O so the above scripts will be missing the Object Name. 
In this case there is a second though more costly way to get the OBJECT NAME and that is by using the FILE# and BLOCK# that is being read. The File# and BLOCK# can be translated into to the OBJECT via a query like

select  segment_name, segment_type
from     dba_extents 
where    &file  = file_id --  File = P1
     and &block between     -- Block = P2
              block_id and block_id + blocks – 1

This query is expensive and slow. If running this query more than once, it's recommended to create a copy of "DBA_EXTENTS" like

Create table my_extents as select * from DBA_EXTENTS;
Create index my_extents_i on my_extents(FILE_ID,BLOCK_ID);


To give an idea of just how much faster a copy of DBA_EXTENTS is, here an example on my machine (with out even creating an index)

Enter value for file: 3
Enter value for block: 6619

OWNER          SEGMENT_NAME          SEGMENT_TYPE
--------------- -------------------- ------------------
WMSYS           LOG_TAB_PK           INDEX

Elapsed: 00:00:41.25

Above I just queried for one block. Now below I will look up over a 150 objects in a faster time !

create table myextents as select * from dba_extents
Table created.

Elapsed: 00:01:25.73

col own for A7
col segment_name for a40
col segment_type for a15
col cnt for 9999
Select 
      count(*) cnt, 
      owner own, 
      segment_name , 
      segment_type
from 
      v$active_session_history ash, 
      myextents ext
where
       ( event like 'db file s%' or event like 'direct%' )
    and (current_obj# in (0,-1) or current_obj# is Null)
    and sample_time > sysdate - &v_minutes/(60*24)
    and session_state='WAITING'
    and P1 = file_id
    and P2  between  block_id and block_id + blocks - 1
group by 
   owner, segment_name, segment_type
/

CNT  OWN SEGMENT_NAME    SEGMENT_TYPE
--- ---- ---------------- ------------                  
  11 SYS SMON_SCN_TO_TIME CLUSTER
 993 SYS _SYSSMU7$        TYPE2 UNDO
...

150 rows selected.

Elapsed: 00:00:01.03


On other option is querying X$BH just after the IO has occured. X$BH reveals the contents of which blocks are in the buffer cache, so if the blocks from the recent physical IOs are still in the buffer cache we can query X$BH.




Tuning IO



Check average read times per file
Should be between 5-20 ms
    ADDM expects I/Os to be under 20ms
        The parameter defaults to 10000 micro seconds
        change behavior with DBMS_ADVISOR.SET_DEFAULT_TASK_PARAMETER( 'ADDM', 'DBIO_EXPECTED', <new value>)
Check time in 
    Data in Statspack under “File IO Stats” (8i and up)
    DBA_HIST_FILESTATXS for historical (10g)
    V$FILEMETRIC_HISTORY for recent (10g)
Check Cache buffer Hit ratio
Check db_cache_advice 9i and higher
Data in Statspack under “Buffer Pool Advisory”
Want to optimize data caching
Tune High IO SQL
Check reads off of UNDO


Average IO times

Statspack/AWR

File IO Stats  DB/Inst:labsf03  Snaps: 1-2
                                            Av   Mx                                      Av
                                     Av     Rd   Rd    Av                 Av  Buffer   BufWt
Tablespace          Filename Reads Reads/s (ms)  Bkt Blks/Rd  Writes Writes/s  Waits    (ms)
---------- ----------------- ----- ------- ---- ---- ------- ------- -------- ------- ------
SYSTEM     /u01/system01.dbf   445      15  0.4   16    1.0    1,157       39   2,744   93.3
USERS      /u01/users01.dbf    223       7  0.5  ###    1.0    9,725      324       4  100.0

Buffer Cache Hit Ratio

Buffer Cache Hit Ratio (BCHR) has a sordid history and it's tempting to just not mention it. For years the BCHR has been used to identify IO performance bottlenecks despite the fact that it can be completely unreliable. Connor McDonald even wrote a script (http://www.oracledba.co.uk/tips/choose.htm) that allows you to set the BCHR as high as you want without changing the size of the cache. The point being that IO bottlenecks should be identified with IO wait events and not BCHR, but if an IO bottleneck does happen, it is worth knowing if the buffer cache is supporting the IO load sufficiently. The question of whether the Buffer Cache is sufficiently sized can be addressed with Oracle's "DB Cache Advice".


Buffer Pool Advisory  
Size for   Size      Buffers for   Read      Estimated
P    Est (M) Factor     Estimate Factor Physical Reads
--- -------- ------ ------------ ------ --------------
D         56     .1        6,986    2.3         58,928
D        112     .2       13,972    1.6         42,043
D        224     .4       27,944    1.0         25,772
D        336     .6       41,916    1.0         25,715
D        448     .8       55,888    1.0         25,715
D        596    1.0       74,351    1.0         25,715
D        728    1.2       90,818    1.0         25,715
D        840    1.4      104,790    1.0         25,715
D        952    1.6      118,762    1.0         25,715
D      1,064    1.8      132,734    1.0         25,715

The Buffer Pool Advisory gives expected physical reads at different cache sizes. The column "Factor" shows size relative to the current. I "Factor" of "1" is the current cache size. We can see in the above case, we can actually drop the cache size to .4 of the current size without significantly impacting the physical IO and on the other hand increasing it to 1.8 the current size won't help us at all. In this case the buffer cache size is probably fine an won't help us address IO performance issues on this particular database.

TOP SQL

If the IO response time of the disks is good and if the buffer cache is correctly sized, then at this point it's clear it's time to tune the SQL with the most IO wait time:

select count(*),sql_id, event 
from v$active_session_history
where event in ('db file sequential read',
                'db file scattered read',
                'db file parallel read')
  and session_state='WAITING'
group by sql_id, event
order by count(*);

 COUNT(*) SQL_ID        EVENT
--------- ------------- ------------------------
       10 8hk7xvhua40va db file sequential read
      335 3hatpjzrqvfn7 db file sequential read
      343 0uuqgjq7k12nf db file sequential read
      549 75621g9y3xmvd db file sequential read
     1311 0bzhqhhj9mpaa db file sequential read
     1523 41zu158rqf4kf db file sequential read  
     1757 0yas01u2p9ch4 db file sequential read



direct path read

The direct path read wait event has three parameters: file#, first block#, and block count. In Oracle Database 10g, this wait event falls under the User I/O wait class. Keep the following key thoughts in mind when dealing with the direct path read wait event.

  • These are waits that are associated with direct read operations. An Oracle direct read operation reads data directly into the session’s PGA (Program Global Area), bypassing the SGA. The data in the PGA is not shared with other sessions.

  • Direct reads may be performed in synchronous or asynchronous mode, depending on the platform and the value of the DISK_ASYNC_IO parameter. The systemwide direct path read wait event statistics can be very misleading when asynchronous I/O is used.

  • A significant number of direct path read waits is most likely an application issue.

Common Causes, Diagnosis, and Actions

The direct path read waits are driven by SQL statements that perform direct read operations from the temporary or regular tablespaces.

SQL statements with functions that require sorts, such as ORDER BY, GROUP BY, UNION, DISTINCT, and ROLLUP, write sort runs to the temporary tablespace when the input size is larger than the work area in the PGA. The sort runs in the temporary tablespace are subsequently read and merged to provide the final result. An Oracle session waits on the direct path read wait event while the sort runs are being read. SQL statements that employ the merge join, either through a hint or as decided by the optimizer, also require sort.

SQL statements that employ the hash join, either as directed by a hint or the optimizer, flush the hash partitions that don’t fit in memory to the temporary tablespace. The hash partitions in the temporary tablespace are read back into the memory to be probed for rows that match the SQL predicates. An Oracle session waits on the direct path read wait event while the hash partitions are being read.

SQL statements that employ parallel scans also contribute to the systemwide direct path read waits. In a parallel execution, the direct path read waits are associated with the query slaves and not the parent query. The session that runs the parent query waits mostly on the PX Deq: Execute Reply wait event (in Oracle8i Database and above).


Session-Level Diagnosis

It is highly unlikely that the direct path read wait event will show up as the leading bottleneck within a session even though it may actually be it.

Therefore you shouldn’t evaluate direct path read waits based on the TOTAL_WAITS or TIME_WAITED in the V$SESSION_EVENT view. Instead, you can find current sessions that perform a lot of direct read operations from the V$SESSTAT view using the following query. The physical reads direct is comprised of the direct reads that are originated by the parent session itself as well as the sum of all direct reads that are originated by the query slaves that the parent session employs. The direct reads that are initiated by query slaves are only reflected in the parent session when the slaves complete their jobs. The downside to this approach is that there is no time element.

select a.name, b.sid, b.value, 
round((sysdate - c.logon_time) * 24) hours_connected
from v$statname a, v$sesstat b, v$session c
where b.sid = c.sid
and a.statistic# = b.statistic#
and b.value > 0
and a.name = 'physical reads direct'
order by b.value;


NAME SID VALUE HOURS_CONNECTED
------------------------- ---- ---------- ---------------
physical reads direct 2 41 980
physical reads direct 4 41 980
physical reads direct 5 445186 980

Apart from finding the sessions that perform a lot of direct reads, you must also find out where the sessions are reading from (temporary tablespace or data files), the SQL statements that initiate the waits, and the type of segments being read. The following query gives you the answers. Sessions that read from the temporary tablespace may be reading sort or hash segments. Sessions that read from data files normally belong to parallel query slaves.

select a.event,
a.sid,
c.sql_hash_value hash_value,
decode(d.ktssosegt,
1,'SORT', 2,'HASH', 3,'DATA',
4,'INDEX',5,'LOB_DATA',6,'LOB_INDEX',
null) as segment_type,
b.tablespace_name,
b.file_name
from v$session_wait a, dba_data_files b, v$session c, x$ktsso d
where c.saddr = d.ktssoses(+)
and c.serial# = d.ktssosno(+)
and d.inst_id(+) = userenv('instance')
and a.sid = c.sid
and a.p1 = b.file_id
and a.event = 'direct path read'
union all
select a.event,
a.sid,
d.sql_hash_value hash_value,
decode(e.ktssosegt,
1,'SORT', 2,'HASH', 3,'DATA',
4,'INDEX',5,'LOB_DATA',6,'LOB_INDEX',
null) as segment_type,
b.tablespace_name,
b.file_name
from v$session_wait a, dba_temp_files b, v$parameter c,
v$session d, x$ktsso e
where d.saddr = e.ktssoses(+)
and d.serial# = e.ktssosno(+)
and e.inst_id(+) = userenv('instance')
and a.sid = d.sid
and b.file_id = a.p1 - c.value
and c.name = 'db_files'
and a.event = 'direct path read'
order by 1,2;

-- File name output is edited to fit page.
EVENT SID HASH_VALUE SEGMENT TABLESPACE_N FILE_NAME
------------------ --- ---------- ------- ------------ -----------------
direct path read 8 511952958 SORT TEMP_BATCH temp_batch_01.dbf
direct path read 9 3138787393 ORDERS orders_01.dbf
direct path read 11 3138787393 ORDERS orders_01.dbf
direct path read 12 3138787393 ORDERS orders_01.dbf
direct path read 14 3138787393 ORDERS orders_01.dbf

If you catch a session reading sort segments from the temporary tablespace, this indicates the SORT_AREA_SIZE (or work area size if you use the PGA_AGGREGATE_TARGET in Oracle9i Database) is not large enough to accommodate a cache sort (or in memory sort). This is fine. It is unrealistic to expect all SQL statements to perform cache sorts. However, you should avoid multipass sorts because they create a lot of I/Os to the temporary tablespace and are very slow. How can you tell if a SQL statement is doing a multipass sort? Well, it is not that easy in versions prior to Oracle9i Database. You have to trace the session with the event 10032 and examine the trace file. However, beginning in Oracle9i Database, you can simply query the V$SQL_WORKAREA or V$SQL_WORKAREA_ACTIVE views with the SQL hash value that performs the sort.

The goal of tuning in this case is to minimize the number of sorts as a whole and, more specifically, disk sorts. Increasing the SORT_AREA_SIZE (or PGA_AGGREGATE_ TARGET) may help reduce the number of disk sorts, but that’s usually the workaround rather than the cure, unless your SORT_AREA_SIZE is unreasonably small to begin with. You should examine the application and the SQL statements to see if sorts are really necessary. Applications have the tendency to abuse the DISTINCT and UNION functions. Whenever possible use UNION ALL instead of UNION, and where applicable use HASH JOIN instead of SORT MERGE and NESTED LOOPS instead of HASH JOIN. Make sure the optimizer selects the right driving table. Check to see if the composite index’s columns can be rearranged to match the ORDER BY clause to avoid sort entirely. Also, consider automating the SQL work areas using PGA_AGGREGATE_TARGET in Oracle9i Database. Statistically, the automatic memory management delivers a higher percentage of cache sorts.


Note 

Be careful when switching from UNION to UNION ALL as this can produce different results depending on the data. The UNION operator returns only distinct rows that appear in either result, while the UNION ALL operator returns all rows.


Note 

By default, the HASH_AREA_SIZE is twice the SORT_AREA_SIZE. A larger HASH_AREA_SIZE will influence the optimizer toward the hash joins (full table scan) rather than nested loops operation.

Similarly, if you catch a session reading hash segments from the temporary tablespace, all that tells you is that the HASH_AREA_SIZE (or work area size in case of the PGA_AGGREGATE_TARGET in Oracle9i Database) is not big enough to accommodate the hash table in memory. The solution is similar to the one just mentioned: cure it from the application and SQL tuning before adjusting the HASH_AREA_SIZE (or PGA_AGGREGATE_TARGET). Unless, of course, your HASH_AREA_SIZE is too small to begin with.

If you discover that the direct reads belong to parallel query slaves, you should verify if parallel scans are appropriate for the parent SQL statement and that the degree of parallelism is right. Make sure the query slaves do not saturate your CPUs or disks. Identifying the parent SQL statement can be a bit tricky because the hash value of the parent statement is not the same as the hash value of child statements that the query slaves execute.This query can help you identify the parent SQL statements when parallel queries are involved:


select decode(a.qcserial#, null, 'PARENT', 'CHILD') stmt_level,
a.sid,
a.serial#,
b.username,
b.osuser,
b.sql_hash_value,
b.sql_address,
a.degree,
a.req_degree
from v$px_session a, v$session b
where a.sid = b.sid
order by a.qcsid, stmt_level desc;

STMT_L SID SERIAL# USERNAME OSUSER HASH_VALUE SQL_ADDR DEG REQ_DEG
------ --- ------- -------- ------- ---------- -------- --- -------
PARENT 20 779 INTREPID cdh8455 3663187692 A0938E54
CHILD 17 3617 INTREPID cdh8455 817802256 A092E1CC 4 4
CHILD 21 370 INTREPID cdh8455 817802256 A092E1CC 4 4
CHILD 12 841 INTREPID cdh8455 817802256 A092E1CC 4 4
CHILD 14 2241 INTREPID cdh8455 817802256 A092E1CC 4 4

Initialization Parameters of Interest

The DB_FILE_DIRECT_IO_COUNT initialization parameter can impact the direct path read performance. It sets the maximum I/O buffer size for direct reads and writes operations. Up to Oracle8i Database, the default value on most platforms is 64 blocks. So if the DB_BLOCK_SIZE is 8K, the maximum I/O buffer size for direct reads and writes operations is 512K. This number is further subject to hardware limits.

The DB_FILE_DIRECT_IO_COUNT parameter is hidden in Oracle9i Database, and the value is expressed in bytes instead of blocks. The default value in Oracle9i Database is 1MB. The actual direct I/O size depends on your hardware configuration and limits.

You can discover the actual direct read I/O size in three ways:

  • Trace the Oracle session that performs direct reads operations using the trace event 10046 at level 8. The P3 parameter indicates the number of blocks read. Based on the following example, the direct path readI/O size is 64K since the block size is 8K. Alternatively, you can query the V$SESSION_WAIT view for the P3 value of the direct path read event.

    WAIT #1: nam='direct path read' ela= 4 p1=4 p2=86919 p3=8
    WAIT #1: nam='direct path read' ela= 5 p1=4 p2=86927 p3=8
    WAIT #1: nam='direct path read' ela= 10 p1=4 p2=86935 p3=8
    WAIT #1: nam='direct path read' ela= 39 p1=4 p2=86943 p3=8
    WAIT #1: nam='direct path read' ela= 5 p1=4 p2=86951 p3=8
    WAIT #1: nam='direct path read' ela= 38 p1=4 p2=86959 p3=8
    . . .
  • Trace the Unix session that performs direct reads or writes operations using the operating system trace facility such as trusstusctrace, or strace. The snippet of the truss report from an Oracle9i Database reveals the direct I/O size is 65536 bytes or 64K:

    9218/6:         kaio(AIONOTIFY, -14712832)                      = 0
    9218/1: kaio(AIOWAIT, 0xFFBECE98) = 1
    9218/1: lwp_cond_signal(0xFEB7BFA0) = 0
    9218/3: pread64(256, "0602\0\001\0 ~13C19AEE }".., 65536, 0x0FC26000) = 65536
    9218/1: lwp_cond_signal(0xFEB69FA0) = 0
    9218/4: pread64(256, "0602\0\001\0 ~1BC19AEFE7".., 65536, 0x0FC36000) = 65536
    . . .
  • Enable the debug information for the session that performs direct I/O operations using the trace event 10357 at level 1. Example: alter session set events '10357 trace name context forever, level 1'. The snippet of the trace file is provided here:

    Unix process pid: 4375, image: oracle@kccdeds73 (P000)
    
    *** SESSION ID:(9.18) 2004-02-08 21:47:01.908
    DBA Range Initialized: length is 1570, start dba is 0100602b
    kcblin: lbs=fc86c1cc flag=8 slot_cnt=32 slot_size=65536 state obj=24321224
    kcblin: state objects are: Call=243a2210,Current Call=243a2210, Session=24321224
    kdContigDbaDrCbk:starting from tsn 5
    kdContigDbaDrCbk:starting from rdba 0100602b
    kdContigDbaDrCbk:returning 1570 blocks
    kcblrs:issuing read on slot : 0
    kcbldio:lbs=fc86c1cc slt=fc86408c typ=0 async=1 afn=4 blk=602b cnt=8 buf=fc87fe00
    kcblrs:issuing read on slot : 1
    kcbldio:lbs=fc86c1cc slt=fc864210 typ=0 async=1 afn=4 blk=6033 cnt=8 buf=fc89fe00
    kcblcio: lbs=fc86c1cc slt=fc86408c type=0 afn=4 blk=602b cnt=8 buf=fc87fe00
    . . .

In the preceding example, the trace file belongs to query slave #0 (P000). There are 32 I/O slots available for the direct read operation (slot_cnt=32). A slot is a unit of I/O, and each slot is 65536 bytes (slot_size=65536). Asynchronous I/O is enabled during the read operation (async=1). The query slave reads data file #4 (afn=4). The number of blocks read is 8 (cnt=8). Since the block size is 8K, this translates to 65536 bytes.

In this case, the direct I/O slot size prevents the process from achieving the full 1MB, which is the default limit of the _DB_FILE_DIRECT_IO_COUNT parameter. The slot size can be modified by event 10351. The number of slots can be modified by event 10353.


Caution 

The preceding information gives you a sense of the direct I/O throughput in your system. Don’t simply change the default slot size or the number of direct I/O slots. You need to know your hardware limits before doing so. Besides, you should focus on optimizing the application and SQL statements first.

Lastly, in Oracle8i Database, direct reads can also be enabled for serial scans with the _SERIAL_DIRECT_READ initialization parameter. Earlier releases may do the same by setting event 10355. Doing so will cause data from full scans to be read into the session’s PGA and not shared with other sessions. This can sharply increase memory usage.






Network Waits

Introduction




Unfortunately, what Oracle calls "Network Waits" have little to do with Network but and almost exclusively to do with the time it takes to pack messeges for the network before they are sent.

Client = you, the tool, sqlplus, application
the shadow process is communicating to the client

Of the three waits, only "more data" is possibly related to network issues and that's not even clear, the other two are simply the time it takes to pack a message before sending it.

SQL*Net message to client - time to pack a message (no network time included) possibly tune SDU              
SQL*Net more data from client - possible network issues, possibly tune SDU
SQL*Net more data to client - time to pack a message (no network time included) possibly tune SDU

 The same events exist, but where the client is the shadow process and another database plays the roll of shadow process:

SQL*Net message to dblink  
SQL*Net more data from dblink - possible network issues, possibly tune SDU
SQL*Net more data to dblink 

SQL*Net Wait Events


SQL*Net message from client



Idle Event
Waiting for work from Client
Includes network transmission times for messages coming from shadow
Typically indicative of Client “think time” or “processing time”

Example from Egor Starostin,  http://oracledba.ru
From a 10046 trace
    
    =====================
   PARSING IN CURSOR #1 len=43 dep=0 uid=0 oct=3 lid=0 tim=1304096237
    hv=2707617103 ad='89a03e18'
    select * from all_objects where rownum < 20
    END OF STMT
    PARSE #1:c=0,e=143,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1304096209
    EXEC #1:c=0,e=744,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1304097036
    WAIT #1: nam='SQL*Net message to client' ela= 3 driver id=1650815232
    #bytes=1 p3=0 obj#=-1 tim=1304097096
    FETCH #1:c=10000,e=6903,p=0,cr=9,cu=0,mis=0,r=1,dep=0,og=1,tim=1304104057
1->WAIT #1: nam='SQL*Net message from client' ela= 721 driver
    id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1304104865        # [non-idle]
    WAIT #1: nam='SQL*Net message to client' ela= 1 driver id=1650815232
    #bytes=1 p3=0 obj#=-1 tim=1304105319
    FETCH #1:c=0,e=627,p=0,cr=21,cu=0,mis=0,r=15,dep=0,og=1,tim=1304105524
2->WAIT #1: nam='SQL*Net message from client' ela= 253 driver
    id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1304105818        # [non-idle]
    WAIT #1: nam='SQL*Net message to client' ela= 1 driver id=1650815232
    #bytes=1 p3=0 obj#=-1 tim=1304105867
    FETCH #1:c=0,e=63,p=0,cr=6,cu=0,mis=0,r=3,dep=0,og=1,tim=1304105900
3->WAIT #1: nam='SQL*Net message from client' ela= 1960753 driver
    id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1306066946 # [idle]
    =====================
    PARSING IN CURSOR #1 len=21 dep=0 uid=0 oct=3 lid=0 tim=1306069444
    hv=2200891488 ad='89913b50'
    select user from dual
    END OF STMT
    PARSE #1:c=0,e=60,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1306069440
    ...
The first two "SQL*Net message from client' are in the middle of cursor processing and are considered non-idle waits.
The third "SQL*Net message from client" is between cursors and considered an idle event, ie we are waiting for the next command from the client.

SQL*Net message to client 



Time it takes to pack a message to be sent to the client
Doesn’t include network timing
see Tanel Poder's analysis of SQL*Net message to client

SQL*Net more data to client 

Same as SQL*Net message to client except this is for data that spans SDU packets.
Wait represents the time it takes to pack data.
Doesn’t include network timing

SQL*Net more data from client



The only SQL*Net wait that can indicate a possible NETWORK problem
Client is sending data to shadow that spans packets (think large data inserts, possibly large code blocks, large SQL statements)
Shadow waits for next packet.
Can indicate network latency.
Can indicate a problem with the client tool

Here is an example with ASHMON where the application server died mid-stream on inserts. The shadow processes were left waiting for completion of the message. You can see the regular load on the database on the left, then just past the middle the load crashes, and all that's left is waits on "SQL*Net more data from client"



Possibly set SDU=32768 as well as setting RECV_BUF_SIZE and SEND_BUF_SIZE to 65536.

SQL*Net break/reset to client

Error in sql statement
Control C
Usually highlights and error in application

Example:

CREATE TABLE T1 (C1 NUMBER);
ALTER TABLE T1 ADD 
     (CONSTRAINT T1_CHECK1 CHECK (C1 IN ('J','N')));
ALTER SESSION SET EVENTS
     '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12';
INSERT INTO T1 VALUES (1);

Trace File

PARSING IN CURSOR #2 len=25 dep=0 uid=0 oct=2 lid=0 tim=5009300581224 hv=9816834
09 ad='8e6a7c10'
INSERT INTO T1 VALUES (1)
END OF STMT
PARSE #2:c=0,e=2770,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,tim=5009300581220
BINDS #2:
EXEC #2:c=0,e=128,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=5009300581418
ERROR #2:err=1722 tim=512952379
WAIT #2: nam='SQL*Net break/reset to client' ela= 31 driver id=1650815232 break?
=1 p3=0 obj#=-1 tim=5009300581549
WAIT #2: nam='SQL*Net break/reset to client' ela= 92 driver id=1650815232 break?
=0 p3=0 obj#=-1 tim=5009300581662

Unfortunately Oracle doesn't give much information about debugging unless you are trace. If you don't trace, the SQL won't be captured because from Oralce's point of view the problem statement isn't an acceptable SQL statement so there is no SQL ID to track down.

DBLINK SQL*Net Waits



These waits are the same as 

SQL*Net message to dblink 
SQL*Net more data from dblink 
SQL*Net more data to dblink 
SQL*Net break/reset to dblink

Analysis and Tuning

There isn't much to do on the Oracle side for tuning. You can try optimizing the SDU and SEND_BUF_SIZE and RECV_BUF_SIZE.
For actually getting information on network speeds you will have to use something like

SDU

The default SDU can be set in the sqlnet. ora
If it's not set, the default is 2048
The max is 32768
The default,or the value in sqlnet.ora, can be overridden in the tnsnames. ora and the listener.ora. The client and server negotiate the size aggreeing on the smaller of the two settings. 
(TDU – Transmission Data Unit – see note 44694.1 The TDU parameter has been deprecated in the Oracle Net v8.0 and beyond and is ignored. It is only mentioned here for backward compatibility.)


tnsnames.ora

V10G = (DESCRIPTION =
(SDU=32768)
(ADDRESS = (PROTOCOL = TCP)(HOST = fuji)(PORT = 1522))
(CONNECT_DATA =
(SERVER = DEDICATED) (SERVICE_NAME = v10g)
) )

listener.ora

SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SDU=32768)
(SID_NAME = v10g)
(ORACLE_HOME = /export/home/oracle10)
))

Tracing

sqlnet.ora

trace_level_client=16
trace_directory_client=/tmp
trace_file_client=client.trc
trace_unique_client = true
trace_level_server=16
trace_directory_server=/tmp
trace_file_server=server.trc

client.trc

client_3582.trc:[12-JAN-2008 11:37:39:237] nsconneg: vsn=313, gbl=0xa01, sdu=32768, tdu=32767

more from Jonathan Lewis at http://www.jlcomp.demon.co.uk/sdu.html

RECV_BUF_SIZE and SEND_BUF_SIZE

see: http://download.oracle.com/docs/cd/B19306_01/network.102/b14212/performance.htm

The recommended size for these buffers (from Oracle's docs) is at least

Network bandwidth * roundtrip = buffer min size

For example if the network bandwidth is 100mbs and the round trip time (from ping) is 5ms then

100,000,000 bits   1 byte   5 seconds
---------------- x ------ x --------- = 62,500 bytes
 1 second          8 bits     1000


tnsnames.ora

V10G = (DESCRIPTION =
(SEND_BUF_SIZE=65536)
(RECV_BUF_SIZE=65536)
(ADDRESS = (PROTOCOL = TCP)(HOST = fuji)(PORT = 1522))
(CONNECT_DATA =
(SERVER = DEDICATED) (SERVICE_NAME = v10g)
) )

listener.ora

SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SEND_BUF_SIZE=65536)
(RECV_BUF_SIZE=65536)
(SID_NAME = v10g)
(ORACLE_HOME = /export/home/oracle10)
))


sqlnet.ora

RECV_BUF_SIZE=65536
SEND_BUF_SIZE=65536




Redo Log Waits

Redo

log buffer space
log file switch (archiving needed)
log file switch (checkpoint incomplete) 
log file switch (private strand flush incomplete)
log file switch completion

 
Redo is written to disk when
User commits
Log Buffer 1/3 full (_log_io_size)
Log Buffer fills 1M
Every 3 seconds
DBWR asks LGWR to flush redo
Sessions Commiting wait for LGWR 


Redo Log Wait Events

Log file Sync


Wait for redo flush upon:
Commit
Rollback
Arguments
P1 = buffer# in log buffer that needs to be flushed
P2 = not used
P3 = not used
                  

Commit less 

Often possible in  loops that commit every loop
Commit every 50 or 100 instead
Batching vs Single row operations

Single row operations, lots of communication , slowest
-- slowest : single row inserts with commit
insert into foo values ...;
insert into foo values ...;
insert into foo values ...;
insert into foo values ...;
 
          Send all the info over the wire to the database, one communication
-- For Loop
FOR i IN 1..i LOOP  
     INSERT INTO foo VALUES (id(i), data(i));
END LOOP;
COMMIT;

          Send all the info over the wire to the database, one communication, bulk insert
-- FORALL
FORALL i IN 1..i  
     INSERT INTO foo VALUES (id(i), data(i));
COMMIT;


Improve IO Speed

Put redo on dedicated disk
Use Raw Device or Direct IO
More Radical
Consider Ram Disks
Can stripe if redo writes are comparable to stripe size
Striping shouldn’t hurt
Striping can help
Ex: imp – can have large redo writes – can improve by 10-30%
Alternate disks for redo and archiving of redo
Possibly 10gR2
ALTER SYSTEM SET COMMIT_WRITE = BATCH, NOWAIT 
Commit could be lost if machine crash 
Or IO error 
high log file sync because of high CPU demand:

log buffer space


Wait for space in the redo log buffer in SGA
Solution
Increase log_buffer parameter in init.ora
Above 3M log_buffer little affect, if still a problem then backup is at disk level
Improve disk IO for redo
Faster disk
Raw file
Direct IO
Dedicated disk
p1, p2, p3 – no values  

log file switch (archiving needed)


No p1,p2,p3
Database “hangs” for transactions

archive log stop; 
-- make room in log_archive_dest 
archive log start;  



log file switch (checkpoint incomplete) 


No p1,p2,p3 args
Wait for checkpoint to complete because all log files are full
Solutions
Add more log files 
Increase size of log files 

log file switch (private strand flush incomplete)

New wait 10g
Like a “log file switch Completion”

log file switch completion 


No p1,p2,p3
Wait for lgwr to switch log files when generating redo
Solution:
Increase redo log file size
Reduces frequency of switches

What happens when a log file switch occurs:
Get next log file from control file
Get Redo Copy and Redo Allocation latch
Flush redo
Close File
Update Controlfile
Set new file to Current
Set old file to Active
If in Archivelog mode add file to archive list
Open all members of new logfile group
Write the SCN to the headers
Enable redo log generation
DBWR makes a list of blocks that need to be written out in order to over write the Redo log file a list of blocks that need to be written out in order to over write the Redo log file 

switch logfile command

Same as log file switch completion but the command is executed by the dba
Alter system switch logfile;

Redo Log Sizing Concerns and Considerations


What happens to recovery time if I change my redo log file sizes
Larger Redo Log size can increase recovery time but
There are init.ora parameters to limit this

Standby DB: ARCHIVE_LAG_TARGET 
Seconds, limits lag between primary and standby
Increases log file switches

FAST_START_MTTR_TARGET 
Seconds to Recovery
Easy and accuracy
Is overridden by FAST_START_IO_TARGET 
Is overridden by LOG_CHECKPOINT_INTERVAL 

alter system set fast_start_mttr_target=17 scope=both;

SQL> select ESTIMATED_MTTR from  V$INSTANCE_RECOVERY;

ESTIMATED_MTTR
--------------
            21
 






Buffer Busy Waits

Buffer Busy Waits usually happen on Oracle 10 and 11 mainly because of insert contention into tables or Indexes.  There are a few other rare cases of contention on old style RBS segments, file headers blocks and freelists.

Before Oracle 10 and 11 there was one other major reason which was readers waiting for readers, ie one user does a phyiscal IO of a block into memory and a second user want to read that block. The second user waits until the IO is finished by the first user. Starting in 10g this wait has been given the name "read by other session".  Before Oracle 10g this was also a "buffer busy wait".

The easiest way to analyse the bottleneck and find a solution is to use ASH (active session History) available in Oracle 10g with the diagnostics pack license or using Simulated ASH for free or using a product like DB Optimizer.

Data block class, which can be found in ASH,  is the most important piece of information in analysing buffer busy waits. If we know the block class we can determine what kind of bottleneck:

 If CLASS=
  1. data block
    • IF OTYPE =
      • INDEX , then the insert index leaf block is probably hot, solutions are
          • Hash partition the index
          • Use reverse key index
      • TABLE, then insert block is hot,solutions
        • Use free lists
        • Put Object in ASSM tablespace
  2. Segment header - If "segment header" occurs at the same time as CLASS= "data block" on the same object and the object is of OTYPE= "TABLE"  then this is just a confirmation that the TABLE needs to use free lists or  ASSM.
  3. File Header Block - Most likely extent allocation problems, look at extent size on tablespace and increase the extent size to there are few extent allocations and less contention on the File Header Block.
  4. free lists - Add free list groups to the object
  5. undo header - Not enough UNDO segments, if using old RBS then switch to AUM
  6. undo block - Hot spot in UNDO, application issue 

How do we find the block class? With a quick query on the ASH data like:
select o.object_name obj, o.object_type otype, ash.SQL_ID, w.class from v$active_session_history ash, ( select rownum class#, class from v$waitstat ) w, all_objects o where event='buffer busy waits' and w.class#(+)=ash.p3 and o.object_id (+)= ash.CURRENT_OBJ# Order by sample_time;
For Example
OBJ OTYPE SQL_ID CLASS ------ ------ ------------- ------------------ TOTO1 TABLE 8gz51m9hg5yuf data block TOTO1 TABLE 8gz51m9hg5yuf data block TOTO1 TABLE 8gz51m9hg5yuf segment header TOTO1 TABLE 8gz51m9hg5yuf data block
If we find that CLASS=datablock, then we will want more information to diagnose, such as the object type "OTYPE" , object name and what kind of tablespace the object is stored in. The following query provides that information:

set linesize 120
col block_type for a20
col objn for a25
col otype for a15
col filen for 9999
col blockn for 9999999
col obj for a20
col tbs for a10
select
       bbw.cnt,
       bbw.obj,
       bbw.otype,
       bbw.sql_id,
       bbw.block_type,
       nvl(tbs.name,to_char(bbw.p1)) TBS,
       tbs_defs.assm ASSM
from (
    select
       count(*) cnt,
       nvl(object_name,CURRENT_OBJ#) obj,
       o.object_type otype,
       ash.SQL_ID sql_id,
       nvl(w.class,'usn '||to_char(ceil((ash.p3-18)/2))||' '||
                    decode(mod(ash.p3,2),
                         1,'header',
                         0,'block')) block_type,
       --nvl(w.class,to_char(ash.p3)) block_type,
       ash.p1 p1
    from v$active_session_history ash,
        ( select rownum class#, class from v$waitstat ) w,
        all_objects o
    where event='buffer busy waits'
      and w.class#(+)=ash.p3
      and o.object_id (+)= ash.CURRENT_OBJ#
      and ash.session_state='WAITING'
      and ash.sample_time > sysdate - &minutes/(60*24)
      --and w.class# > 18
   group by o.object_name, ash.current_obj#, o.object_type,
         ash.sql_id, w.class, ash.p3, ash.p1
  ) bbw,
    (select   file_id, 
       tablespace_name name
  from dba_data_files
   ) tbs,
    (select
 tablespace_name    NAME,
        extent_management  LOCAL,
        allocation_type    EXTENTS,
        segment_space_management ASSM,
        initial_extent
     from dba_tablespaces 
   ) tbs_defs
  where tbs.file_id(+) = bbw.p1
    and tbs.name=tbs_defs.name
Order by bbw.cnt
/

and the output looks like

 
  CNT OBJ     OTYPE   SQL_ID        BLOCK_TYPE       TBS        ASSM
----- ------- ------- ------------- ---------------- ---------- ------
    3 TOTO1   TABLE   8gz51m9hg5yuf segment header   NO_ASSM    MANUAL
   59 TOTO1   TABLE   8gz51m9hg5yuf data block       NO_ASSM    MANUAL


Why do buffer busy waits happen?


To put it most succinctly, buffer busy waits happen because two users want to change a block at the same time. Two users can change the same block, or even same row "at the same time" ie without committing, but that's different from the actual operation of modifying the block. The modification on the block in RAM, or computer memory, can only be done by one process at at time in order to avoid memory corruptions. Different users can modify different blocks at the same time but only one user or process can modify a the same block at a time.

In order to really understand what's going on we have to take a look at how Oracle manages memory and block access and modifications.
Here is the layout of 

Above is a diagram shows some of the essential parts of Oracle in regards to performance tuning. 
In the machine memory are
  •     Oracle's SGA, or System Global Area, a memory that is shared between Oracle users
  •     LGWR - log writer process
  •     DBWR - database writer process
  •     User1,2,3 ... - user processes, in this case "shadow processes" 
On the machine file system are
  • Redo log files
  • Data files
The SGA is composed of (among other things)
  • Log Buffer
  • Library Cache
  • Buffer Cache
What's important for understanding buffer busy waits is how the buffer cache is managed. Here is view of the buffer cache with more components:
In order to access a block, a user (shadow process) has to get a latch (cache buffer chains latch) which protects buckets or linked lists of buffer headers. Once the header desired if found the latch is released. The buffer headers point to the actual data block in memory. Before modifying a block in memory a user has to lock the buffer header. The buffer header is locked any time a modification is made whether it is reading a block into memory or modifying a block that is already in memory. Usually the header is locked only for a brief amount of time but when there is a lot of concurrent access the buffer header can become a bottleneck.


BBW when readling data - read by other session
A buffer busy can happen on oracle 7,8 and 9 when one user is reading a block into memory and a second user wants to read that block. Instead of the second user trying to read that block into memory as well, they just wait for the first user to finish. Starting in Oracle 10, this kind of wait was renames "read by other session"

BBW on insert
If multiple concurrent users are inserting into a table that doesn't have free lists or is not in an ASSM tablespace then all users will end up inserting into the same block, the first one on the free list and this block will become the hot block

by adding free lists or moving the table to an ASSM tablespace we will alleviate the bottleneck.

Multiple free lists:
The other option is ASSM or Automatic Segment Space Management which is set at the tablespace level.
In this case free block information is kept in Level 1 BMB (or bitmapped blocks). These Level 1 BMBs are chosen by a hash on the users process ID thus distributing the inserts across the table. 

The inserts would look something like this (somewhat exaggerated drawing)
the ASSM BMB blocks take up more space in the table , about 1 extra block for every 16 data blocks and there is overhead first looking in the header/level 3 BMB block then going to the Level 2  then level 1 and finally to the datablock but all in all ASSM is worth reduced costs of management verses free lists.

Identifying and creating ASSM tablespaces

Which tablespaces are ASSM or not?
select
        tablespace_name,
        extent_management  LOCAL,
        allocation_type    EXTENTS,
        segment_space_management ASSM,
        initial_extent
from dba_tablespaces

TABLESPACE_NAME LOCAL      EXTENTS   ASSM
--------------- ---------- --------- ------
SYSTEM          LOCAL      SYSTEM    MANUAL
UNDOTBS1        LOCAL      SYSTEM    MANUAL
SYSAUX          LOCAL      SYSTEM    AUTO
TEMP            LOCAL      UNIFORM   MANUAL
USERS           LOCAL      SYSTEM    AUTO
EXAMPLE         LOCAL      SYSTEM    AUTO
DATA            LOCAL      SYSTEM    MANUAL

creating an ASSM tablespace:

create tablespace data2 
datafile '/d3/kyle/data2_01.dbf' 
size 200M
segment space management auto;


BBW on index (because of insert)



If users are inserting data that has a rising key value, especially a monotonically rising value, then all the new inserts will have to update the leading edge leaf block of the index and with high concurrent inserts this can cause buffer busy waits.

Solutions
Hash partition the index 
Reverse Key Index

BBW on old style RBS
IF block class > 18 it's an old style RBS segment
Select  CURRENT_OBJ#||' '||o.object_name objn,
          o.object_type otype,
          CURRENT_FILE# filen,
          CURRENT_BLOCK# blockn,
          ash.SQL_ID,
          w.class ||' '||to_char(ash.p3) block_type
from v$active_session_history ash,
      (select rownum class#, class from v$waitstat ) w,
       all_objects o
where event='buffer busy waits'
    and w.class#(+)=ash.p3
    and o.object_id (+)= ash.CURRENT_OBJ#
Order by sample_time;    

OBJN        OTYPE  FILEN  BLOCKN SQL_ID        BLOCK_TYPE
----------- ------ ------ ------ ------------- ------------
54962 TOTO1 TABLE     16   45012 8gz51m9hg5yuf data block 
54962 TOTO1 TABLE     16     161 8gz51m9hg5yuf segment header
0                     14       9 8gz51m9hg5yuf  87
0                     14       9 8gz51m9hg5yuf  87
  
IF the block is of class > 18, the there will be no object name, so we have to look it up ourselves to be sure:

select  segment_name,  
            segment_type
from     dba_extents 
where
     &P2  between 
    block_id and block_id + blocks – 1
     and 
     file_id = &P1 ;
    
Plug in 14 for P1 the file # and 9 for P2 the block number:

SEGMENT_NAME   SEGMENT_TYPE
-------------- --------------
R2             ROLLBACK
   
solution
move to new AUM or Automatic Undo Mangement

alter system set undo_management=auto  scope=spfile;

BBW on a file header
The ASH data has two different fields that indicate the file # and block # when the wait is a buffer busy wait.
For a buffer busy wait 
    File # = p1  *and* File # = current_file#
    Block # = P2  *and* Block # = current_block#
if  p1 != current_file#  or p2 != current_block# then use p1 and p2. They are more reliable.
for example
Time   P1  P2 OBJN OTYPE FN BLOCKN BLOCK_TYPE
----- --- --- ---- ----- -- ------ -----------------
11:44 202   2 -1          0      0 file header block
11:44 202   2 TOTO TABLE  1  60218 file header block
11:44 202   2 TOTO TABLE  1  60218 file header block
11:44 202   2 TOTO TABLE  1  60218 file header block
11:44 202   2 TOTO TABLE  1  60218 file header block
          
Notice P1 != BLOCKN (BLOCKN is CURRENT_BLOCK#)  and P2 != FN (FN is CURRENT_FILE#)
The real file # is P1 =202 and block # is P2 which is 2
In my database I only had 10 files, so what is this file# 202?!
It's temporary tablespace data file

Solution
If you are getting buffer busy waits on the file header block for a tempfile (datafile in a temporary tablespace) then try increasing the "next extent" size in the temporary tablespace.
This wait can happen when lots of extents are being allocated in the temporary tablespace.

What Would ADDM do?





Interstingly enough the ADDM page doesn't show the new load that has recently come on the system but the analysis is there.  I clicked on the next to bottom line in the page, "Read and write contention on database blocks was consuming significant database time.

Here are the outputs for the different scenarios.

 inserts into a table contention



 inserts into a table with contention on index



RBS contention


File Header Contention



What would DB Optimizer DO ?

The database below is maxed out. There is more demand for CPU than there is CPU. There is only a small amount of Buffer Busy Waits, but even with that small about we can drill down into the BBW and get all the details. By clicking on "buffer busy wait" under details we can see what SQL was running int the BBWs, what sessions were running into BBWs and we can get all the info we want in order to be able to analyze the BBW under the "Analysis" tab: