All
About
Oracle Latches
What are latches
Latches vs Enqueues
How Latches work
Causes of Contention
for
specific latches
Reducing
Contention
for Internal Latches
Measuring Latch Contention
Tuning Applications
to
avoid Latch contention
Identifying Hot Blocks
What Are
Latches?
Latches are serialization mechanisms that protect areas of Oracle’s
shared memory (the SGA). In simple terms, latches prevent two
processes
from simultaneously updating — and possibly corrupting — the same
area
of the SGA. A latch is a type of a lock that can be very
quickly
acquired and freed. Latches are typically used to prevent more than
one
process from executing the same piece of code at a
given time.
Oracle sessions need to update or read from the SGA for almost all
database operations. For instance:
• When a session reads a block from disk, it must modify a free
block
in the buffer cache and adjust the buffer cache LRU chain
• When a session reads a block from the SGA, it will modify the LRU
chain.
• When a new SQL statement is parsed, it will be added to the
library
cache within the SGA.
• As modifications are made to blocks, entries are placed in the
redo
buffer.
• The database writer periodically writes buffers from the cache to
disk (and must update their status from “dirty” to “clean”).
• The redo log writer writes entries from the redo buffer to the
redo
logs.
Latches prevent any of these operations from colliding and possibly
corrupting the SGA.
Latches vs Enqueues
Enqueues are another type of locking mechanism used in Oracle. An
enqueue is a more sophisticated mechanism which permits several
concurrent processes to have varying degree of sharing of "known"
resources. Any object which can be concurrently used, can be
protected
with enqueues. A good example is of locks on tables. We allow
varying
levels of sharing on tables e.g. two processes can lock a table in
share mode or in share update mode etc. One difference is that the
enqueue is obtained using an OS specific locking mechanism. An
enqueue
allows the user to store a value in the lock, i.e the mode in which
we
are requesting it. The OS lock manager keeps track of the resources
locked. If a process cannot be granted the lock because it is
incompatible with the mode requested and the lock is requested with
wait, the OS puts the requesting process on a wait queue which is
serviced in FIFO. Another difference between latches and enqueues is
that in latches there is no ordered queue of waiters like in
enqueues.
Latch waiters may either use timers to wakeup and retry or spin
(only
in multiprocessors). Since all waiters are concurrently retrying
(depending on the scheduler), anyone might get the latch and
conceivably the first one to try might be the last one to get
How
Latches Work
Because the duration of operations against memory is very small
(typically in the order of nanoseconds) and the frequency of latch
requests very high, the latching mechanism needs to be very
lightweight.
If the latch is already in use, Oracle can assume that it will not
be
in use for long, so rather than go into a passive wait (e.g.,
relinquish the CPU and go to sleep) Oracle will retry the operation
a
number of times before giving up and going to passive wait. This
algorithm is called acquiring a spinlock and the number of “spins”
before sleeping is controlled by the Oracle initialization parameter
“_spin_count”.
The first time the session fails to acquire the latch by spinning,
it
will attempt to awaken after 10 milliseconds. Subsequent waits will
increase in duration and in extreme circumstances may exceed one
second. In a system suffering from intense contention for latches,
these waits will have a severe impact on response time and
throughput.
Causes of
contention for specific latches
If a required latch is busy, the process requesting it spins, tries
again and if still not available, spins again. The loop is repeated
up
to a maximum number of times determined by the initialization
parameter
_SPIN_COUNT. If after this entire loop, the latch is still not
available, the process must yield the CPU and go to sleep. Initially
is
sleeps for one centisecond. This time is doubled in every subsequent
sleep. This causes a slowdown to occur and results in additional CPU
usage, until a latch is available. The CPU usage is a consequence of
the "spinning" of the process. "Spinning" means that the process
continues to look for the availability of the latch after certain
intervals of time, during which it sleeps.
The latches that most frequently affect performance are those
protecting the buffer cache, areas of the shared pool and the redo
buffer.
• Library cache and shared pool
latches: These latches protect the library cache in
which
sharable SQL is stored. In a well defined application there
should be little or no contention for these latches, but in an
application that uses literals instead of bind variables (for
instance
“WHERE surname=’HARRISON’” rather that “WHERE surname=:surname”,
library cache contention is common.
• Redo copy/redo allocation
latches:
These latches protect the redo log
buffer, which buffers entries made to the redo log. Recent
improvements
(from Oracle 7.3 onwards) have reduced the frequency and severity of
contention for these latches.
• Cache buffers chain latches (Row
cache
objects latch, Library cache latch, Shared pool latch):
These
latches are held when sessions read or write to buffers in the
buffer
cache. There are typically a very large number of these
latches each of which protects only a handful of blocks.
Contention on these latches is typically caused by concurrent access
to
a very “hot” block and the most common type of such a hot block is
an
index root or branch block (since any index based query must access
the
root block).
Reducing
contention
for internal latches
We can reduce contention for these latches and tune them by
adjusting
certain init.ora parameters.
Cache buffer chain latch:
Contention in this latch might be related with the Buffer cache
size,
but it might be present due to a "hot block" (meaning a block
highly
accessed). Before incrementing the parameter DB_BLOCK_BUFFERS
check
that specific blocks are not causing the contention avoiding
memory
wasting.
Cache buffers LRU chain latch:
Multiple Buffer pools and adjusting the parameter
DB_BLOCK_LRU_LATCHES
to have multiple LRU latches will help on reducing latch
contention.
Redo Allocation Latch:
Contention for this latch in Oracle7 can be reduced by decreasing
the
value of LOG_SMALL_ENTRY_MAX_SIZE on multi-cpu systems to force
the use
of the redo copy latch. In Oracle8i this parameter is obsolete, so
you
need to consider to increase the size of the LOG_BUFFER or reduce
the
load of the log buffer using NOLOGGING features when possible.
Redo copy latch:
This latch is waited for on both single and multi-cpu systems. On
multi-cpu systems, contention can be reduced by increasing the
value of
LOG_SIMULTANEOUS_COPIES (Hidden in Oracle8i) and/or increasing
LOG_ENTRY_PREBUILD_THRESHOLD (undocumented in Oracle7).
Row cache objects latch:
In order to reduce contention for this latch, we need to tune the
data
dictionary cache. In Oracle7 this basically means increasing the
size
of the shared pool (SHARED_POOL_SIZE) as the dictionary cache is a
part
of the shared pool.
Library cache and Shared pool latches
The first resource to reduce contention on this latch is to ensure
that
the application is reusing as mush as possible SQL statement
representation. If the application is already tuned the
SHARED_POOL_SIZE can be increased. Be aware that if the
application is
not using appropriately the library cache the contention might be
worst
with a larger structure to be handled.
Measuring
Latch Contention
Wait
interface-based techniques (better way)
The best approach to estimating the impact of latch contention is to
consider the relative amount of time being spent waiting for
latches.
The following query gives us some indication of this:
SELECT substr(event,1,50)
event,
time_waited,
round(time_waited*100/ SUM
(time_waited) OVER(),2) wait_pct
FROM (SELECT event, time_waited
FROM
v$system_event
WHERE event NOT
IN
('Null
event',
'client
message',
'rdbms
ipc reply',
'smon
timer',
'rdbms
ipc message',
'PX
Idle Wait',
'PL/SQL
lock timer',
'file
open',
'pmon
timer',
'WMON
goes to sleep',
'virtual
circuit status',
'dispatcher
timer',
'SQL*Net
message from client',
'parallel
query dequeue wait',
'pipe
get'
)
UNION
(SELECT
NAME,
VALUE
FROM
v$sysstat
WHERE
NAME LIKE 'CPU used when call started'))
ORDER BY 2 DESC;
EVENT
TIME_WAITED WAIT_PCT
------------------------------
----------- ----------
latch free
40144
31.67
CPU used when call started
30341
23.94
control file sequential read
12341
9.74
direct path read
11933
9.41
control file parallel write
6487
5.12
file identify
5666
4.47
log file sync
3492
2.75
log file parallel write
3213
2.53
instance state change
3064
2.42
log file switch completion
3049
2.41
db file sequential read
2290 1.81
Now we can look at the sleeps in v$latch to determine which latches
are
likely to be contributing most to this problem:
select substr(name,1,35) name,
gets, sleeps,
sleeps*100/sum(sleeps)
over() sleep_pct, sleeps*100/gets sleep_rate
from v$latch
where gets > 0
order by sleeps desc;
NAME
GETS
SLEEPS SLEEP_PCT
SLEEP_RATE
------------------------------
----------- ------------ --------- ----------
cache buffers
chains
13,863,552
38,071
99.48 .2746
session
allocation
1,223,982
110
.29
.0090
checkpoint queue
latch
1,461,039
39
.10
.0027
library
cache
9,239,751
22
.06
.0002
shared
pool
869,652
16
.04
.0018
messages
2,130,515
6
.02
.0003
redo
writing
1,330,987
6
.02
.0005
latch wait
list
33
0
.00
.0000
session
switching
8,014
0
.00
.0000
session idle
bit
2,428,851
0
.00
.0000
enqueues
3,171,018
0
.00
.0000
channel handle pool
latch
40
0
.00
.0000
message pool operations
parent
3
0
.00
.0000
Now we are in a position to make some reasonable conclusions:
• Latch sleeps contribute to about 30% of database response time
(very
excessive),
AND
• It’s the cache buffers chains latches that contributes to the vast
majority of these waits.
Note that if we had used the conventional “ratio based” analysis
outlined in the previous section we would have discounted cache
buffers
chains latches as a problem because the miss rate was “only” 0.15%.
Tuning the
Application to Avoid Latch Contention
There are some things we can do within our application design that
can
reduce contention for latches.
Using Bind Variables
As noted earlier, failure to use bind variables within an
application
is the major cause of library cache latch contention. All Oracle
applications should make use of bind variables whenever possible.
However, all is not lost if you are unable to modify your
application
code. From 8.1.6 onwards you can use the “CURSOR_SHARING” parameter
to
cause Oracle to modify SQL on the fly to use bind variables. A
setting
of FORCE causes all literals to be converted to bind variables. A
setting of SIMILAR causes statements to be rewritten only if it
would
not cause the statements execution plan today (which can happen if
there are histogram statistics defined on a column referenced in the
WHERE clause). The default value for this option is EXACT.
Avoiding Hot Blocks
Cache buffers chains latch contention is one of the most intractable
types of latch contention. There are a couple of things you can do
at
the application level to reduce the severity of this type of
contention.
Firstly, identify the blocks that are “hot.” Metalink note 163424.1,
“How to Identify a Hot Block Within The Database” describes how to
do
this. Having identified the identity of the hot block, you will most
likely find that it is an index root or branch block. If this is the
case, there are two application design changes that may help.
1) Consider partitioning the table
and
using local indexes. This might allow you to spread the heat
amongst
multiple indexes (you will probably want to use a hash partition
to
ensure an even spread of load amongst the partitions).
2) Consider converting the table to a hash cluster keyed on the
columns
of the index. This allows the index to be bypassed completely and
may
also result in some other performance improvements. However, hash
clusters are suitable only for tables of relatively static size,
and
determining an optimal setting for the SIZE and HASHKEYS storage
parameters are essential.
Identifying HOT BLocks
Metalink
Note:163424.1
Goals:
How to identify blocks which cause latch contention on the 'cache
buffers chains' latch.
How to identify a hot block within the database buffer cache
Possible hot blocks in the buffer cache normally can be identified
by a
high or rapid increasing wait count on the CACHE BUFFERS CHAINS
latch.
This latch is acquired when searching for data blocks cached
in
the buffer cache. Since the Buffer cache is implemented as a sum of
chains of blocks, each of those chains is protected by a child of
this
latch when needs to be scanned. Contention in this latch can be
caused
by very heavy access to a single block. This can require the
application to be reviewed.
By examining the waits on this latch, information about the segment
and
the specific block can be obtained using the following queries.
First determine which latch id(ADDR) are interesting by examining
the
number of sleeps for this latch. The higher the sleep count,
the
more interesting the latch id(ADDR) is:
select CHILD# "cCHILD",
ADDR "sADDR",
GETS
"sGETS" , MISSES "sMISSES",
SLEEPS "sSLEEPS"
from v$latch_children
where name = 'cache buffers
chains'
order by 4, 1, 2, 3;
Run the above query a few times to to establish the id(ADDR) that
has
the most consistent amount of sleeps. Once the id(ADDR) with the
highest sleep count is found then this latch address can be used to
get
more details about the blocks currently in the buffer cache
protected
by this latch. The query below should be run just after determining
the
ADDR with the highest sleep count.
column segment_name format a35
select /*+ RULE */
e.owner ||'.'||
e.segment_name segment_name,
e.extent_id
extent#,
x.dbablk - e.block_id +
1 block#,
x.tch,
l.child#
from
sys.v$latch_children
l,
sys.x$bh x,
sys.dba_extents e
where
x.hladdr = 'ADDR'
and
e.file_id = x.file# and
x.hladdr = l.addr and
x.dbablk between
e.block_id and e.block_id + e.blocks -1
order by x.tch desc ;
SEGMENT_NAME
EXTENT#
BLOCK#
TCH CHILD#
---------------------
------------ ------------ ------ ----------
SCOTT.EMP_PK
5
474 17 7,668
SCOTT.EMP
1
449 2
7,668
Depending on the TCH column (The number of times the block is hit by
a
SQL statement), you can identify a hotblock. The higher the value of
the TCH column, the more frequent the block is accessed by SQL
statements.
In order to reduce contention for this object the following
mechanisms
can be put in place:
1)Examine the application to see if the execution of
certain DML and SELECT statements can be reorganized to eliminate
contention on the object.
2)Decrease the buffer cache -although this may only
help
in a small amount of cases.
3)DBWR throughput may have a factor in this as well. If
using multiple DBWR's then increase the number of DBWR's
4)Increase the PCTUSED / PCTFREE for the table storage
parameters via ALTER TABLE or rebuild. This will result in less rows
per block.
5)Consider implementing reverse key indexes (if range
scans aren't commonly used against the segment)