Submit a ticketCall us

Welcome to the NEW Success Center. Search all resources (documentation, videos, training, knowledge base articles) or browse resources by product. If you are unable to find what you are looking for, please contact us at customersuccess@solarwinds.com

 

 

 

 

Home > Success Center > Database Performance Analyzer (DPA) > Mutex lock when gathering Oracle statistics

Mutex lock when gathering Oracle statistics

Overview

Due to several Oracle bugs, Ignite can encounter 'Library Cache Mutex X' waits on the Monitored Database Instance when retrieving SQL Statistics.  There are over 30 documented bugs associated with this wait. This article describes what Mutexes are, the Mutex waits and the many issues that may be encountered.

 

If Ignite encounters Mutex wait or acquires a Mutex to get Statistics, the query to retrieve SQL statistics may timeout, producing a "socket read timed out" error in the Ignite logs.  This error in turn causes the Ignite monitor to try to restart.  In some environments, the Oracle session which has the Mutex where the timed out occurred,  is orphaned by Ignite.  This means that the Oracle Instance still sees the session as active, doesn't know that it's orphaned and tries to continue processing the session's request.  At this point, if you attempt to kill the ophaned session using the Oracle "alter system kill" command, the system may hang because the Oracle Process Monitor (PMON) also has to aquire a Mutex in order to clean up the session.  If PMON is in the 'Library Cache Mutex X' wait, the system will hang indefinitely and the only quick resolution is to restart the database instance. 

 

Note: A better way to clean up the orphaned session is to issue an OS kill command on the system process id.  For Linux/Unix sessions, you will need to issue a 'kill -9 pid', or for Windows, 'orakill pid' at a command prompt.  This will kill the underlying OS process so the session is not seen as 'active' in Oracle, and PMON will correctly clean it up without the Mutex wait.

Environment

  • Ignite 8.2.27 and earlier

Resolution

A. Upgrade Oracle.  The version affected by these issues are: Oracle Server - Enterprise Edition - Version: 10.1.0.2 to 11.2.0.3 - Release: 10.1 to 11.2  Information in this document applies to any platform.

 

B. If you cannot upgrade Oracle to a version that does not exhibit the bug, you can do the following in Ignite:

  • In Ignite 8.2.27 and below, the solution is to disable Ignite statistics collection. To do this: In Ignite > Options > Advanced Options >  select the 'Support Option' box > Choose the Monitored Instance from the Dropdown Box > Set ENABLE_SQLSTATS to 'false'.
  • Upgrade to Ignite 8.2.28 or higher, which will automatically disable statistics collection if an Ignite query against x$kglcursor appears to be hung.  Ignite detects this by issuing the following query (if it returns any results, it means that there is a hung statistics query):
        SELECT a.sql_id, a.event, a.p1text, a.p1, a.p2text, a.p2, a.last_call_et, a.status, a.state
         FROM v$session a, v$sql b
         WHERE a.program = 'IgniteMonitor'
         AND a.sql_id = b.sql_id
         AND a.status = 'ACTIVE'
         AND a.last_call_et > 420
         AND b.sql_text like '%FROM x$kglcursor%';
  • The following system property changes the connection timeout value in the query for the last_call_et comparison
    • com.confio.idc.monitor.stats.orphaned.connection.time=420
Last modified
19:16, 22 Jun 2016

Tags

Classifications

Public