写点什么

【Hive】 HiveServer2 内存溢出总结

作者:扬_帆_起_航
  • 2024-08-02
    北京
  • 本文字数:6785 字

    阅读完需:约 22 分钟

1.前言

用户使用 Beeline 访问 HiveServer2 (3.1.2 版本) 执行离线 SQL 任务,持续运行一周后 HiveServer2 就出现 OOM 现象,严重影响数据查询与报表产出,经过几轮修复问题终于解决。作者把修复过的问题进行了汇总,避免其他小伙伴再遇到此问题时束手无策。

2.案例

2.1 HIVE-16455

HiveServer2 在使用 ADD JAR 语句时导致文件句柄泄漏


[root@host-10-17-80-111 ~]# lsof -p 29588 | grep "(deleted)" | wc -ljava    29588 hive  391u   REG              252,3    125987  2099944 /tmp/57d98f5b-1e53-44e2-876b-6b4323ac24db_resources/hive-contrib.jar (deleted)java    29588 hive  392u   REG              252,3    125987  2099946 /tmp/eb3184ad-7f15-4a77-a10d-87717ae634d1_resources/hive-contrib.jar (deleted)java    29588 hive  393r   REG              252,3    125987  2099825 /tmp/e29dccfc-5708-4254-addb-7a8988fc0500_resources/hive-contrib.jar (deleted)java    29588 hive  394r   REG              252,3    125987  2099833 /tmp/5153dd4a-a606-4f53-b02c-d606e7e56985_resources/hive-contrib.jar (deleted)java    29588 hive  395r   REG              252,3    125987  2099827 /tmp/ff3cdb05-917f-43c0-830a-b293bf397a23_resources/hive-contrib.jar (deleted)java    29588 hive  396r   REG              252,3    125987  2099822 /tmp/60531b66-5985-421e-8eb5-eeac31fdf964_resources/hive-contrib.jar (deleted)java    29588 hive  397r   REG              252,3    125987  2099831 /tmp/78878921-455c-438c-9735-447566ed8381_resources/hive-contrib.jar (deleted)java    29588 hive  399r   REG              252,3    125987  2099835 /tmp/0e5d7990-30cc-4248-9058-587f7f1ff211_resources/hive-contrib.jar (deleted)
复制代码

2.2 HIVE-24236

不容易复现,只能某些特定条件下可能存在连接泄漏风险


2020-09-29T18:44:26,563 INFO  [Heartbeater-0]: txn.TxnHandler (TxnHandler.java:checkRetryable(3733)) - Non-retryable error in heartbeat(HeartbeatRequest(lockid:0, txnid:11908)) : Cannot get a connection, general error (SQLState=null, ErrorCode=0)2020-09-29T18:44:26,564 ERROR [Heartbeater-0]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(201)) - MetaException(message:Unable to select from transaction database org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, general error        at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:118)        at org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3605)        at org.apache.hadoop.hive.metastore.txn.TxnHandler.getDbConn(TxnHandler.java:3598)        at org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:2739)        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:8452)        at sun.reflect.GeneratedMethodAccessor415.invoke(Unknown Source)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)        at java.lang.reflect.Method.invoke(Method.java:498)        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)        at com.sun.proxy.$Proxy63.heartbeat(Unknown Source)        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:3247)        at sun.reflect.GeneratedMethodAccessor414.invoke(Unknown Source)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)        at java.lang.reflect.Method.invoke(Method.java:498)        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:213)        at com.sun.proxy.$Proxy64.heartbeat(Unknown Source)        at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:671)        at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.lambda$run$0(DbTxnManager.java:1102)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:422)        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)        at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:1101)        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
复制代码

2.3 HIVE-24552

调用 loadDynamicPartitions(Hive.java)时生成多个线程来处理 FileMove,这些线程可能会生成 HiveMetaStore 连接,这些连接可能没有及时关闭造成大量的连接堆积。


2020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 439012020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 439002020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 438992020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 438982020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="Finalizer"] Closed a connection to metastore, current connections: 438972020-12-15T17:05:38.485Z hiveserver2-0 hiveserver2 1 a3671b96-74fb-4ee9-b186-aeff0de0bbec [mdc@18060 class="transport.TIOStreamTransport" level="WARN" thread="Finalizer"] Error closing output stream.java.net.SocketException: Socket closed  at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)  at java.net.SocketOutputStream.write(SocketOutputStream.java:155)  at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)  at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)  at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
复制代码

2.4 HIVE-24858

如果在会话中注册了一个 UDF JAR 并从中创建了一个临时函数,当会话关闭时 UDFClassLoader 不会被 GC 回收掉。


Class Name                                                                                                                          | Shallow Heap | Retained Heap-------------------------------------------------------------------------------------------------------------------------------------------------------------------contextClassLoader org.apache.hive.service.server.ThreadWithGarbageCleanup @ 0x7164deb50  HiveServer2-Handler-Pool: Thread-72 Thread|          128 |        79,072referent java.util.WeakHashMap$Entry @ 0x7164e67d0                                                                                  |           40 |           824'- [6] java.util.WeakHashMap$Entry[16] @ 0x71581aac0                                                                                |           80 |         5,056   '- table java.util.WeakHashMap @ 0x71580f510                                                                                     |           48 |         6,920      '- CACHE_CLASSES class org.apache.hadoop.conf.Configuration @ 0x71580f3d8                                                     |           64 |        74,528-------------------------------------------------------------------------------------------------------------------------------------------------------------------
复制代码

2.5 HIVE-26404

HiveMetaStore 无法响应 JVM 垃圾回收停顿时间长,堆内存 org.apache.hadoop.conf.Configuration 占用过多存在 OOM 风险。


 Class Name                                                                             | Shallow Heap | Retained Heap----------------------------------------------------------------------------------------------------------------------org.apache.hadoop.fs.FileSystem$Cache @ 0x45403fe70                                    |           32 |   108,671,824|- <class> class org.apache.hadoop.fs.FileSystem$Cache @ 0x45410c3e0                   |            8 |           544'- map java.util.HashMap @ 0x453ffb598                                                 |           48 |    92,777,232   |- <class> class java.util.HashMap @ 0x4520382c8 System Class                       |           40 |           168   |- entrySet java.util.HashMap$EntrySet @ 0x454077848                                |           16 |            16   '- table java.util.HashMap$Node[32768] @ 0x463585b68                                |      131,088 |    92,777,168      |- class java.util.HashMap$Node[] @ 0x4520b7790                                  |            0 |             0      '- [1786] java.util.HashMap$Node @ 0x451998ce0                                   |           32 |         9,968         |- <class> class java.util.HashMap$Node @ 0x4520b7728 System Class            |            8 |            32         '- value org.apache.hadoop.hdfs.DistributedFileSystem @ 0x452990178           |           56 |         4,976            |- <class> class org.apache.hadoop.hdfs.DistributedFileSystem @ 0x45402e290|            8 |         4,664            |- uri java.net.URI @ 0x451a05cd0  hdfs://nameservice1                     |           80 |           432            |- dfs org.apache.hadoop.hdfs.DFSClient @ 0x451f5d9b8                      |          128 |         3,824            '- conf org.apache.hadoop.hive.conf.HiveConf @ 0x453a34b38                 |           80 |       250,160----------------------------------------------------------------------------------------------------------------------
复制代码

2.6 HIVE-22275

单个 Hive Session 执行多条 SQL 语时 OperationManager.queryIdOperation 没有正常清理存在 OOM 风险


2019-09-13T08:37:36,785 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=dfed4c18-a284-4640-9f4a-1a20527105f9]2019-09-13T08:37:38,432 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Removed queryId: hive_20190913083736_c49cf3cc-cfe8-48a1-bd22-8b924dfb0396 corresponding to operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=dfed4c18-a284-4640-9f4a-1a20527105f9] with tag: null2019-09-13T08:37:38,469 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=24d0030c-0e49-45fb-a918-2276f0941cfb]2019-09-13T08:37:52,662 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=b983802c-1dec-4fa0-8680-d05ab555321b]2019-09-13T08:37:56,239 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=75dbc531-2964-47b2-84d7-85b59f88999c]2019-09-13T08:38:30,791 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=b697c801-7da0-4544-bcfa-442eb1d3bd77]2019-09-13T08:39:10,187 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=bda93c8f-0822-4592-a61c-4701720a1a5c]2019-09-13T08:39:15,471 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Removed queryId: hive_20190913083910_c4809ca8-d8db-423c-8b6d-fbe3eee89971 corresponding to operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=24d0030c-0e49-45fb-a918-2276f0941cfb] with tag: null2019-09-13T08:39:15,507 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Removed queryId: hive_20190913083910_c4809ca8-d8db-423c-8b6d-fbe3eee89971 corresponding to operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=b983802c-1dec-4fa0-8680-d05ab555321b] with tag: null2019-09-13T08:39:15,538 INFO  [8eaa1601-f045-4ad5-9c2e-1e5944b75f6a HiveServer2-Handler-Pool: Thread-202]: operation.OperationManager (:()) - Removed queryId: hive_20190913083910_c4809ca8-d8db-423c-8b6d-fbe3eee89971 corresponding to operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=75dbc531-2964-47b2-84d7-85b59f88999c] with tag: null
复制代码

2.7 HIVE-24590

日志输出文件没有正常关闭或删除,Log4j 中的 RandomAccessFileManager 实例占用堆内存空间过多存在 OOM 风险。


3.总结

笔者使用 HiveServer2 版本为 3.1.2,由于此版本内存泄漏问题较多,大家可根据上述案例进行编译修复,如遇到其他 BUG 或性能问题,建议多去社区看看。

发布于: 刚刚阅读数: 5
用户头像

尘世中一个迷途小书童! 2020-03-09 加入

大数据领域从业者,近几年一直从事Kafka领域相关工作

评论

发布
暂无评论
【Hive】 HiveServer2 内存溢出总结_Hive SQL_扬_帆_起_航_InfoQ写作社区