写点什么

故障诊断:ASM 莫名出现 GC 等待事件、ADG 的 MRP 进程 HANG 住

  • 2025-06-13
    福建
  • 本文字数:14125 字

    阅读完需:约 46 分钟

ASM 环境中有 GC 的等待事件的案例很少,今天就为大家分享一个老的案例,分享这个案例的本质不是想看最后的解决方案和定位的 BUG,而是想分享一下 ASM 实例中的异常等待事件的分析方法其实跟普通数据库里时一样的,仍然可以通过 hanganalyze 和 systemstate 来分析,接下来我们就一起看看这个案例


1.环境介绍


这里大概的描述一下版本,有点老,11.2.0.4.2 的版本,现在很少能看见这样的版本了。


Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit ProductionWith the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,Data Mining and Real Application Testing optionsORACLE_HOME = /u01/oracle/product/11.2.0System name:	AIX
复制代码


2.对 ASM 实例做 HANGANALYZE 与 SYSTEMSTATE


这里对 ASM 实例做下面的操作不会生成很多的 TRACE 文件。


oradebug setmypidoradebug unlimitoradebug hanganalyze 3oradebug dump systemstate 266
复制代码


3.KILL MRP 进程


ASM 环境中还运行着数据库备库,此时通过 recover managed standby database cancel 想取消日志前滚,但是不能正常取消,唯有手动 KILL MRP 进程。KILL 后再执行 RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;日志前滚又恢复正常。


4.分析 HANGANALYZE 与 SYSTEMSTATE 日志


4.1 查看 hanganalyze 部分


*** -03-31 20:09:59.453===============================================================================HANG ANALYSIS:  instances (db_name.oracle_sid): cdhtz_p780.cdhtz1  oradebug_node_dump_level: 3  os thread scheduling delay history: (sampling every 1.000000 secs)    0.000000 secs at [ 20:09:58 ]      NOTE: scheduling delay has not been sampled for 0.818740 secs    0.000000 secs from [ 20:09:54 - 20:09:59 ], 5 sec avg    0.000000 secs from [ 20:09:00 - 20:09:59 ], 1 min avg    0.000000 secs from [ 20:05:00 - 20:09:59 ], 5 min avg  vktm time drift history=============================================================================== Chains most likely to have caused the hang: [a] Chain 1 Signature: 'gc buffer busy release'<='buffer busy waits'     Chain 1 Signature Hash: 0xe473997a [b] Chain 2 Signature: 'parallel recovery slave next change'<='buffer busy waits'     Chain 2 Signature Hash: 0x16abd035 [c] Chain 3 Signature: 'gc buffer busy release'<='buffer busy waits'     Chain 3 Signature Hash: 0xe473997a ===============================================================================Non-intersecting chains: -------------------------------------------------------------------------------Chain 1:-------------------------------------------------------------------------------    Oracle session identified by:    {                instance: 1 (cdhtz_p780.cdhtz1)                   os id: 6226628              process id: 98, oracle@cdhtzrzdb1              session id: 255        session serial #: 451    }    is waiting for 'buffer busy waits' with wait info:    {                      p1: 'file#'=0x7                      p2: 'block#'=0x229d                      p3: 'class#'=0x62            time in wait: 0.944766 sec (last interval)            time in wait: 76 min 54 sec (total)           timeout after: never                 wait id: 135153                blocking: 0 sessions            wait history:              * time between current wait and wait #1: 0.000000 sec              1.       event: 'latch: cache buffers chains'                 time waited: 0.000017 sec                     wait id: 152253          p1: 'address'=0x7000103deef8f58                                              p2: 'number'=0xb1                                              p3: 'tries'=0x0              * time between wait #1 and #2: 0.000000 sec              2.       event: 'buffer busy waits'                 time waited: 4.345775 sec (last interval)                 time waited: 76 min 53 sec (total)                     wait id: 135153          p1: 'file#'=0x7                                              p2: 'block#'=0x229d                                              p3: 'class#'=0x62              * time between wait #2 and #3: 0.000000 sec              3.       event: 'latch: cache buffers chains'                 time waited: 0.000012 sec                     wait id: 152252          p1: 'address'=0x7000103deef8f58                                              p2: 'number'=0xb1                                              p3: 'tries'=0x0    }    and is blocked by => Oracle session identified by:    {                instance: 1 (cdhtz_p780.cdhtz1)                   os id: 5767982              process id: 77, oracle@cdhtzrzdb1 (PR0E)              session id: 633        session serial #: 51    }    which is waiting for 'gc buffer busy release' with wait info:    {                      p1: 'file#'=0x7                      p2: 'block#'=0x229d                      p3: 'class#'=0x62            time in wait: 0.923282 sec      heur. time in wait: 24 min 49 sec           timeout after: 0.076718 sec                 wait id: 258197465                blocking: 2 sessions            wait history:              * time between current wait and wait #1: 0.000021 sec              1.       event: 'gc buffer busy release'                 time waited: 1.000014 sec                     wait id: 258197464       p1: 'file#'=0x7                                              p2: 'block#'=0x229d                                              p3: 'class#'=0x62              * time between wait #1 and #2: 0.000012 sec              2.       event: 'gc buffer busy release'                 time waited: 1.000015 sec                     wait id: 258197463       p1: 'file#'=0x7                                              p2: 'block#'=0x229d                                              p3: 'class#'=0x62              * time between wait #2 and #3: 0.000021 sec              3.       event: 'gc buffer busy release'                 time waited: 1.000039 sec                     wait id: 258197462       p1: 'file#'=0x7                                              p2: 'block#'=0x229d                                              p3: 'class#'=0x62    } Chain 1 Signature: 'gc buffer busy release'<='buffer busy waits'Chain 1 Signature Hash: 0xe473997a------------------------------------------------------------------------------- -------------------------------------------------------------------------------Chain 2:-------------------------------------------------------------------------------    Oracle session identified by:    {                instance: 1 (cdhtz_p780.cdhtz1)                   os id: 5963976              process id: 89, oracle@cdhtzrzdb1              session id: 2148        session serial #: 1613    }    is waiting for 'buffer busy waits' with wait info:    {                      p1: 'file#'=0x3                      p2: 'block#'=0x80                      p3: 'class#'=0x11            time in wait: 67 min 44 sec           timeout after: never                 wait id: 2843                blocking: 0 sessions            wait history:              * time between current wait and wait #1: 0.000077 sec              1.       event: 'db file sequential read'                 time waited: 0.006262 sec                     wait id: 2842            p1: 'file#'=0x2                                              p2: 'block#'=0x2202                                              p3: 'blocks'=0x1              * time between wait #1 and #2: 0.000098 sec              2.       event: 'db file sequential read'                 time waited: 0.000507 sec                     wait id: 2841            p1: 'file#'=0x2                                              p2: 'block#'=0x21ea                                              p3: 'blocks'=0x1              * time between wait #2 and #3: 0.000066 sec              3.       event: 'db file sequential read'                 time waited: 0.000380 sec                     wait id: 2840            p1: 'file#'=0x2                                              p2: 'block#'=0x21e2                                              p3: 'blocks'=0x1    }    and is blocked by => Oracle session identified by:    {                instance: 1 (cdhtz_p780.cdhtz1)                   os id: 8127092              process id: 84, oracle@cdhtzrzdb1 (PR0L)              session id: 1516        session serial #: 37    }    which is waiting for 'parallel recovery slave next change' with wait info:    {            time in wait: 0.008511 sec      heur. time in wait: 76 min 57 sec           timeout after: 0.001489 sec                 wait id: 258195905                blocking: 1 session            wait history:              * time between current wait and wait #1: 0.000008 sec              1.       event: 'parallel recovery slave next change'                 time waited: 0.010020 sec                     wait id: 258195904                     * time between wait #1 and #2: 0.000004 sec              2.       event: 'parallel recovery slave next change'                 time waited: 0.010013 sec                     wait id: 258195903                     * time between wait #2 and #3: 0.000005 sec              3.       event: 'parallel recovery slave next change'                 time waited: 0.010016 sec                     wait id: 258195902           } Chain 2 Signature: 'parallel recovery slave next change'<='buffer busy waits'Chain 2 Signature Hash: 0x16abd035------------------------------------------------------------------------------- ===============================================================================Intersecting chains: -------------------------------------------------------------------------------Chain 3:-------------------------------------------------------------------------------    Oracle session identified by:    {                instance: 1 (cdhtz_p780.cdhtz1)                   os id: 3146372              process id: 92, oracle@cdhtzrzdb1              session id: 2524        session serial #: 391    }    is waiting for 'buffer busy waits' with wait info:    {                      p1: 'file#'=0x7                      p2: 'block#'=0x229d                      p3: 'class#'=0x62            time in wait: 7.613869 sec (last interval)            time in wait: 76 min 50 sec (total)           timeout after: never                 wait id: 109934                blocking: 0 sessions            wait history:              * time between current wait and wait #1: 0.000000 sec              1.       event: 'latch: cache buffers chains'                 time waited: 0.000003 sec                     wait id: 126021          p1: 'address'=0x7000103deef8f58                                              p2: 'number'=0xb1                                              p3: 'tries'=0x0              * time between wait #1 and #2: 0.000000 sec              2.       event: 'buffer busy waits'                 time waited: 3.531329 sec (last interval)                 time waited: 76 min 42 sec (total)                     wait id: 109934          p1: 'file#'=0x7                                              p2: 'block#'=0x229d                                              p3: 'class#'=0x62              * time between wait #2 and #3: 0.000000 sec              3.       event: 'latch: cache buffers chains'                 time waited: 0.000018 sec                     wait id: 126020          p1: 'address'=0x7000103deef8f58                                              p2: 'number'=0xb1                                              p3: 'tries'=0x0    }    and is blocked by 'instance: 1, os id: 5767982, session id: 633',    which is a member of 'Chain 1'. Chain 3 Signature: 'gc buffer busy release'<='buffer busy waits'Chain 3 Signature Hash: 0xe473997a-------------------------------------------------------------------------------
复制代码


这里可以连接到数据库进程的阻塞关系。


4.2 使用 ass 格式化 systemstate


awk 脚本可以让我们快速的了解 systemstate 中进程的等待事件和阻塞信息,可加速我们分析问题的效率。这里建议采用最新版本的 ass1045 的 awk,它能展示更多的内容。


[oracle@www.cdhtz.com ]$ awk -f ~/rs/sql/ass1045.awk cdhtz_ora_5046602.trc
Starting Systemstate 1...............................................................................................Ass.Awk Version 1.0.45~~~~~~~~~~~~~~~~~~~~~~Source file : cdhtz_ora_5046602.trc
System State 1 (2025-03-31 20:10:07.375)~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~1: [DEAD] 2: waiting for 'pmon timer' 3: waiting for 'rdbms ipc message' 4: waiting for 'VKTM Logical Idle Wait' 5: waiting for 'rdbms ipc message' 6: waiting for 'DIAG idle wait' 7: waiting for 'rdbms ipc message' 8: waiting for 'PING' 9: waiting for 'rdbms ipc message' 10: waiting for 'DIAG idle wait' 11: waiting for 'rdbms ipc message' 12: waiting for 'ges remote message' 13: waiting for 'gcs remote message' 14: waiting for 'gcs remote message' 15: waiting for 'rdbms ipc message' 16: waiting for 'GCR sleep' 17: waiting for 'rdbms ipc message' 18: waiting for 'rdbms ipc message' 19: waiting for 'rdbms ipc message' 20: waiting for 'rdbms ipc message' 21: waiting for 'rdbms ipc message' 22: waiting for 'rdbms ipc message' 23: waiting for 'smon timer' 24: waiting for 'rdbms ipc message' 25: waiting for 'rdbms ipc message' 26: waiting for 'ASM background timer' 27: waiting for 'rdbms ipc message' 28: waiting for 'rdbms ipc message' 29: waiting for 'RFS dispatch' 30: waiting for 'SQL*Net message from client' 31: waiting for 'wait for unread message on broadcast channel' 32: waiting for 'SQL*Net message from client' 33: waiting for 'rdbms ipc message' 34: waiting for 'rdbms ipc message' 35: waiting for 'rdbms ipc message' 36: waiting for 'class slave wait' 37: waiting for 'SQL*Net message from client' 38: 39: waiting for 'rdbms ipc message' 40: waiting for 'rdbms ipc message' 41: waiting for 'rdbms ipc message' 42: waiting for 'rdbms ipc message' 43: waiting for 'class slave wait' 44: waiting for 'rdbms ipc message' 45: waiting for 'rdbms ipc message' 46: waiting for 'SQL*Net message from client' 47: waiting for 'class slave wait' 48: waiting for 'class slave wait' 49: 50: 51: waiting for 'rdbms ipc message' 52: waiting for 'SQL*Net message from client' 53: waiting for 'RFS dispatch' 54: waiting for 'SQL*Net message from client' 55: waiting for 'SQL*Net message from client' 56: waiting for 'class slave wait' 57: waiting for 'MRP inactivation' 58: waiting for 'SQL*Net message from client' 59: 60: waiting for 'class slave wait' 61: waiting for 'parallel recovery slave next change' 62: 63: waiting for 'parallel recovery slave next change' 64: waiting for 'parallel recovery slave next change' 65: waiting for 'parallel recovery slave next change' 66: waiting for 'parallel recovery slave next change' 67: waiting for 'parallel recovery slave next change' 68: waiting for 'class slave wait' 69: waiting for 'class slave wait' 70: waiting for 'parallel recovery slave next change' 71: waiting for 'parallel recovery slave next change' 72: waiting for 'parallel recovery slave next change' 73: waiting for 'parallel recovery slave next change' 74: waiting for 'parallel recovery slave next change' 75: waiting for 'parallel recovery slave next change' 76: waiting for 'parallel recovery slave next change' 77: waiting for 'gc buffer busy release' 78: waiting for 'parallel recovery slave next change' 79: waiting for 'parallel recovery slave next change' 80: waiting for 'parallel recovery slave next change' 81: waiting for 'parallel recovery slave next change' 82: waiting for 'parallel recovery slave next change' 83: waiting for 'parallel recovery slave next change' 84: waiting for 'parallel recovery slave next change' 85: waiting for 'parallel recovery slave next change' 86: waiting for 'parallel recovery slave next change' 87: waiting for 'parallel recovery slave next change' 89: waiting for 'buffer busy waits' (0x3,0x80,0x11)[Buffer 0x00c00080] Final Blocker: inst: 1, sid: 1516, ser: 3791: 92: waiting for 'buffer busy waits' (0x7,0x229d,0x62)[Buffer 0x01c0229d] Final Blocker: inst: 1, sid: 633, ser: 5193: waiting for 'SQL*Net message from client' 96: waiting for 'direct path read' 97: waiting for 'SQL*Net message from client' 98: waiting for 'buffer busy waits' (0x7,0x229d,0x62)[Buffer 0x01c0229d] Final Blocker: inst: 1, sid: 633, ser: 5199: waiting for 'direct path read'
Blockers~~~~~~~~
Above is a list of all the processes. If they are waiting for a resource then it will be given in square brackets. Below is a summary of the waited upon resources, together with the holder of that resource. Notes: ~~~~~ o A process id of '???' implies that the holder was not found in the systemstate. (The holder may have released the resource before we dumped the state object tree of the blocking process). o Lines with 'Enqueue conversion' below can be ignored *unless* other sessions are waiting on that resource too. For more, see http://gbr30026.uk.oracle.com:81/Public/TOOLS/Ass.html#enqcnv)
Resource Holder State Buffer 0x00c00080 ??? Blocker Buffer 0x01c0229d ??? Blocker
Warning: The following processes have multiple session state objects andmay not be properly represented above : 29: 53:
Object Names~~~~~~~~~~~~Buffer 0x00c00080 Buffer 0x01c0229d
Summary of Wait Events Seen (count>10)~~~~~~~~~~~~~~~~~~~~~~~~~~~ No wait events seen more than 10 times
复制代码


4.3 查询 gc buffer busy release 会话信息


下面就是通过 UE 软件来定位详细的信息了,查看会话和等待事件信息。


ROCESS 77: PR0E  ----------------------------------------  SO: 0x7000103e48c42a0, type: 2, owner: 0x0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3   proc=0x7000103e48c42a0, name=process, file=ksu.h LINE:12721 ID:, pg=0  (process) Oracle pid:77, ser:1, calls cur/top: 0x70001039b0808d0/0x70001039b0805f0            flags : (0x0) -            flags2: (0x30),  flags3: (0x10)             intr error: 0, call error: 0, sess error: 0, txn error 0            intr queue: empty    ksudlp FALSE at location: 0  (post info) last post received: 704768 0 2              last post received-location: ksl2.h LINE:2374 ID:kslpsr              last process to post me: 0x7000103ea81a930 1 6              last post sent: 0 0 160              last post sent-location: kcb2.h LINE:4243 ID:kcbzww              last process posted by me: 0x7000103ea829260 5 0    (latch info) wait_event=704768 bits=0x0    Process Group: DEFAULT, pseudo proc: 0x7000103e29af078    O/S info: user: oracle, term: UNKNOWN, ospid: 5767982     OSD pid info: Unix process pid: 5767982, image: oracle@jklcrzdb1 (PR0E)    Short stack dump: ksedsts()+240<-ksdxfstk()+44<-ksdxcb()+3432<-sspuser()+116<-__sighandler()<-thread_wait()+580<-sskgpwwait()+32<-skgpwwait()+180<-ksliwat()+10772<-kslwaitctx()+180<-kslwait()+112<-kclwlr()+644<-kcbzfc()+2708<-kcbr_media_apply()+1888<-krp_slave_apply()+344<-krp_slave_main()+1092<-ksvrdp()+1804<-opirip()+724<-opidrv()+608<-sou2o()+136<-opimai_real()+188<-ssthrdmain()+276<-main()+204<-__start()+112    ----------------------------------------    SO: 0x7000103eaaf1038, type: 4, owner: 0x7000103e48c42a0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3     proc=0x7000103e48c42a0, name=session, file=ksu.h LINE:12729 ID:, pg=0    (session) sid: 633 ser: 51 trans: 0x0, creator: 0x7000103e48c42a0              flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-              flags2: (0x2080409) -/-/INC              DID: , short-term DID:               txn branch: 0x0              edition#: 0              oct: 0, prv: 0, sql: 0x0, psql: 0x0, user: 0/SYS    ksuxds FALSE at location: 0    service name: SYS$USERS    client details:      O/S info: user: oracle, term: UNKNOWN, ospid: 5767982      machine: jklcrzdb1 program: oracle@jklcrzdb1 (PR0E)    Current Wait Stack:     0: waiting for 'gc buffer busy release'        file#=0x7, block#=0x229d, class#=0x62        wait_id=258197477 seq_num=55118 snap_id=1        wait times: snap=0.007627 sec, exc=0.007627 sec, total=0.007627 sec        wait times: max=1.000000 sec, heur=24 min 59 sec        wait counts: calls=1 os=1        in_wait=1 iflags=0x5a8    There are 2 sessions blocked by this session.    Dumping one waiter:      inst: 1, sid: 255, ser: 451      wait event: 'buffer busy waits'        p1: 'file#'=0x7        p2: 'block#'=0x229d        p3: 'class#'=0x62      row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0      min_blocked_time: 2404 secs, waiter_cache_ver: 27635    Wait State:      fixed_waits=0 flags=0x22 boundary=0x0/-1
复制代码


下面 GC 正在等待的数据块的信息。


     proc=0x7000103ea8281b8, name=call, file=ksu.h LINE:12725 ID:, pg=0    (call) sess: cur 7000103f8ec4eb0, rec 0, usr 7000103f8ec4eb0; flg:20 fl2:1; depth:0    svpt(xcb:0x0 sptn:0x4286 uba: 0x00000000.0000.00)    ksudlc FALSE at location: 0      ----------------------------------------      SO: 0x7000103d4e2c040, type: 38, owner: 0x70001039b096df0, flag: INIT/-/-/0xc0 if: 0x1 c: 0x1       proc=0x7000103ea8281b8, name=buffer handle, file=kcb2.h LINE:2761 ID:, pg=0      (buffer) (CR) PR: 0x7000103ea8281b8 FLG: 0x0      lock rls: 0x700010029dc27d8, class bit: 0x0       cr[0]:       sh[0]:      kcbbfbp: [BH: 0x700010029dc27d8, LINK: 0x7000103d4e2c0c0] (WAITING)      type: normal pin      where: ktuwh27: kturbk, why: 0      BH (0x700010029dc27d8) file#: 7 rdba: 0x01c0229d (7/8861) class: 98 ba: 0x70001002896e000        set: 62 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,0        dbwrid: 1 obj: -1 objn: -1 tsn: 2 afn: 7 hint: f        hash: [0x7000103e7d29d30,0x700010231e8cbf0] lru: [0x700010237e08048,0x700010117dcdec8]        lru-flags: on_auxiliary_list        ckptq: [NULL] fileq: [0x700010143e35aa8,0x700010237e07f68] objq: [0x700010143e35bb0,0x700010237e08070] objaq: [0x700010237e08080,0x700010117dcdf00]        use: [NULL] wait: [0x7000103d712cff0,0x7000103d4e2c0c0]        st: MEDIA_RCV md: NULL rsop: 0x700010394bdde40 tch: 1 le: 0x7000102f3ed7608 rlscn: 0x0bac.feaeaa51        flags: down_grade_lock only_sequential_access block_written_once        Waiting State Objects          ----------------------------------------          SO: 0x7000103d712cf70, type: 38, owner: 0x70001039b0a69c0, flag: INIT/-/-/0xc0 if: 0x1 c: 0x1           proc=0x7000103ea829260, name=buffer handle, file=kcb2.h LINE:2761 ID:, pg=0          (buffer) (CR) PR: 0x7000103ea829260 FLG: 0x0          lock rls: 0x700010029dc27d8, class bit: 0x0           cr[0]:           sh[0]:          kcbbfbp: [BH: 0x700010029dc27d8, LINK: 0x7000103d712cff0] (WAITING)          type: normal pin          where: ktuwh27: kturbk, why: 0          ----------------------------------------          SO: 0x7000103d4e2c040, type: 38, owner: 0x70001039b096df0, flag: INIT/-/-/0xc0 if: 0x1 c: 0x1           proc=0x7000103ea8281b8, name=buffer handle, file=kcb2.h LINE:2761 ID:, pg=0          (buffer) (CR) PR: 0x7000103ea8281b8 FLG: 0x0          lock rls: 0x700010029dc27d8, class bit: 0x0           cr[0]:           sh[0]:          kcbbfbp: [BH: 0x700010029dc27d8, LINK: 0x7000103d4e2c0c0] (WAITING)          type: normal pin          where: ktuwh27: kturbk, why: 0
GLOBAL CACHE ELEMENT DUMP (address: 0x7000102f3ed7608): id1: 0x229d id2: 0x7 pkey: INVALID block: (7/8861) lock: X rls: 0x7 acq: 0x0 latch: 10 flags: 0x20 fair: 255 recovery: 0 fpin: 'kclwh2' bscn: 0xbac.feaeaa51 bctx: 0x0 write: 0 scan: 0x0 lcp: 0x0 lnk: [NULL] lch: [0x700010029dc2910,0x700010029dc2910] seq: 1054 hist: 54 113 238 180 113 238 180 143:0 208 352 32 197 48 121 45 299 LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT: flg: 0x00280400 lflg: 0x4 state: MEDIA_RCV tsn: 2 tsh: 1 waiters: 2 addr: 0x700010029dc27d8 obj: INVALID cls: UNDO BLOCK bscn: 0xbac.feaeaa51 buffer tsn: 2 rdba: 0x01c0229d (7/8861) scn: 0x0bac.feaeaa51 seq: 0x09 flg: 0x04 tail: 0xaa510209 frmt: 0x02 chkval: 0x3bf8 type: 0x02=KTU UNDO BLOCK
复制代码


这里大概定位到处问题的数据块、内存中的状态(state: MEDIA_RCV)了,这种特殊情况怀疑都是 BUG 导致。


5.处理办法


其实说到这儿,聪明的你肯定已经猜到——这多半又是个“祖传 BUG”。于是我火速上 MOS 一搜,果然不出所料,发现了下面这个“熟悉的面孔”:


5.1 搜 MOS 发现如下 BUG



Bug 17695685 Hang in Active Dataguard Database This note gives a brief overview of bug 17695685. The content was last updated on: 17-DEC-2014 Click here for details of each of the sections below.Affects: Product (Component) Oracle Server (Rdbms) Range of versions believed to be affected Versions >= 11.2 Versions confirmed as being affected • 11.2.0.4 • 11.2.0.3 Platforms affected Generic (all / most platforms affected) Fixed: The fix for 17695685 is first included in • (None Specified) Interim patches may be available for earlier versions - click here to check. Symptoms: Related To: • Hang (Process Hang) • Waits for "gc buffer busy release" • Active Dataguard (ADG) • RAC (Real Application Clusters) / OPS • MRP process Description This bug is only relevant when using Real Application Clusters (RAC) Rediscovery:- There is hang in Active Dataguard Database (ADG)- MRP or its slave waits for a buffer with state: MEDIA_RCV For example a hanganalyze trace might show something like: Oracle session identified by: { instance: 1 (dtcpcta.dtcpcta1) os id: 31520 process id: 82, oraclecolldtdbpr31 (PR0J) session id: 982 session serial #: 1157 } which is waiting for 'gc buffer busy release' with wait info: { p1: 'file#'=0x3 p2: 'block#'=0x1399a p3: 'class#'=0x1c time in wait: 0.567673 sec heur. time in wait: 40 min 7 sec timeout after: 0.432327 sec wait id: 35008931 blocking: 2737 sessions current sql: <none> short stack: ksedsts()+465<- ... <-semtimedop()+10<-skgpwwait()+160<-ksliwat()+2022<-kslwaitctx()+163<-kslwait()+141 <-kclwlr()+535<-kcbzfc()+656<-kcbr_media_apply()+1782<-krp_slave_apply()+284<-krp_slave_main() GLOBAL CACHE ELEMENT DUMP (address: 0xf7e7c360): id1: 0x1399a id2: 0x3 pkey: INVALID block: (3/80282) lock: X rls: 0x7 acq: 0x0 latch: 3 flags: 0x20 fair: 255 recovery: 0 fpin: 'kclwh2' bscn: 0x2.3927d9d3 bctx: (nil) write: 0 scan: 0x0 lcp: (nil) lnk: [NULL] lch: [0x31f91f2b0,0x31f91f2b0] seq: 89 hist: 54 113 238 180 113 238 180 113 238 180 113 238 180 113 238 180 113 238 180 113 LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT:flg: 0x00280400 lflg: 0x8 state: MEDIA_RCV tsn: 2 tsh: 4 waiters: 4 - There are waiters for "Media Recovery" buffer
复制代码


目前在 AIX 平台只有 11.2.0.4.5 下面有 PATCH 包,现在环境是 11.2.0.4.2,需要先打 PSU 到 11.2.0.4.5


5.2 关闭一个实例


可以先关闭一个实例就不会出现这个故障了。


5.3 配置参数


还有个“歪招”——把_fairness_threshold 参数设成 0,虽然能暂时压住 BUG,但副作用是数据库性能会“自闭”,慎用慎用!


结尾示例


好啦,今天的“悬疑大戏”就到这里。遇到这种 GC 等待、MRP hang 住的奇葩问题,别慌,按上面套路走一遍,基本都能搞定。实在不行,咱们还有 MOS 和补丁兜底。


文章转载自:认真就输

原文链接:https://www.cnblogs.com/www-htz-pw/p/18926363/gu-zhang-zhen-duanasm-mo-ming-chu-xiangc-deng-dai

体验地址:http://www.jnpfsoft.com/?from=001YH

用户头像

还未添加个人签名 2025-04-01 加入

还未添加个人简介

评论

发布
暂无评论
故障诊断:ASM莫名出现GC等待事件、ADG的MRP进程HANG住_故障_电子尖叫食人鱼_InfoQ写作社区