Friday, August 5, 2011

Oracle Cluster wait events

What is wait event?
An event can be anything that Oracle has to perform on behalf of a set of instructions sent by the user interface. The term “wait” is used because every time a user connects to your application, a resource is allocated to perform tasks on its behalf. The waiting comes when a session is waiting for an action, sometimes from a user and at other times from the database

Wait events for Oracle can be divided as three categories
1.Time-based event
2.System-wide event
3.Session wait

Cluster Wait events
Cluster wait events are caused by multiple nodes in the cluster needing access to the same set of data. These must be tracked down on an individual basis. Usually there are multiple statements and tables that have contention problems.

Ideally, one node would only access one subset of data in order to reduce these conflicts. If possible, different types of operations should be done on each node. For example OLAP access could be targeted to node 1 while OLTP access could be targeted to node 2.

Usually there are multiple queries attempting update the same sets of data from different nodes. In order to determine what queries is producing the contention across the nodes, do the following:

Determine your beginning and ending snapshots from DBA_HIST_SNAPSHOT. Please note that we have used 4000 as beginning snapshot and 5000 as ending snapshot.

Use below SQL query to find event id
SQL> select event_id, event, count(1) cnt from dba_hist_active_sess_history where snap_id between 4000 and 5000 and wait_class_id=3871361733 and event in (’gc cr block busy’,’gc cr multi block request’) group by event_id, event order by 3;
SQL> Select sql_id, count(1) from dba_hist_active_sess_history where snap_id between 4000 and 5000 and event_id in (results from (2) above);
SQL> Select sql_text from dba_hist_sqltext where sqlid in .;

The steps above will tell you the SQL statements that cause the data contention.
The most important wait events for RAC include various categories, such as:
•gc current block 2-way
•gc current block 3-way
•gc cr block 2-way
•gc cr block 3-way
•gc current grant 2-way
•gc cr grant 2-way
•gc current block busy
•gc cr block busy
•gc current buffer busy
•gc current block congested
•gc cr block congested

The major wait events in Oracle RAC are:
1. gc cr request
2. gc buffer busy

In a RAC environment the buffer cache is global across all instances in the cluster and hence the processing differs when compared single-database instance. When a process needs to read data, Oracle will first check to see if it exists in the local buffer cache. If not, it will review global buffer cache to see if another instance already has the data in their buffer cache. If global buffer cache has data then it send a request the data and sends the data via interconnect to avoid disk read. We need to monitor the amount of requests going back and forth via interconnect.

gc cr request( global cache cr request) – This event specifies the time it takes to receive data from remote cache. The main reasons for this event are

RAC traffic using slow connection – Normally we will use high speed inter connect to transfer data between instances, however, sometimes oracle may not able to select the high-speed interconnect and instead route traffic over the slower public network. This significantly increases the wait time for gc cr request wait event.

The oradebug command can be used to verify which network is being used for RAC traffic:
SQL> oradebug setmypid
SQL> oradebug ipc

The oradebug command can be used to manually remaster an object
SQL> oradebug lkdebug -m pkey "object_id"

Inefficient queries - This is mainly due to queries which are not written/tuned properly. The inefficient query increases the amount of data blocks requested by an oracle session. Due to this more often a block will need to be read from a remote instance via interconnect, which causes the wait event.

gc buffer busy(global cache buffer busy) – This wait event is similar to the buffer busy wait event in a single-instance database and this event specifies the time the remote instance locally spends accessing the required data block. You will encounter this wait event due to hot blocks or inefficient queries.

Hot Blocks – You will encounter this issue when multiple sessions requesting a block that either not in buffer cache or it is in incompatible mode. Also heavy delete and inserts operations on hot rows alleviate the problem. Adjusting pctfree and/or pctused parameters will help.

Inefficient Queries – Due to inefficient queries the more blocks requested from the buffer cache the more likelihood of a session having to wait for other sessions. Tuning queries will result to access fewer blocks and will often result in less contention for the same block.

Satishbabu Gunukula