日志

namenode gc导致的故障及解决办法记录

热度 2已有 16734 次阅读2014-11-5 20:29

故障现象：

Hadoop集群异常，所有的hdfs操作都出现问题。

几十个 job报以下错

FAILED:

RuntimeException

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):

Operation category WRITE is not supported in state standby

或者： FAILED:

RuntimeException

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):

Operation category READ is not supported in state standby

故障原因分析:

1)看nn的日志，可以发现是写jn导致的，怀疑是网络问题导致，由于是5台jn，如果是网络问题的话，根据major的算法，需要3台写入失败才会导致出现问题，虽然jn和nn有些机器不在同一个交换机下，但是，从监控上来看，网络没有异常，排除网络问题

2)仔细查看nn的日志，和对源码的分析，发现是由于 namenode这个时间点进行了一次时间比较长的 full gc （96s ），导致写 journalnode 超时（默认是20s）， namenode进程退出。同时从jn的日志可以看出，后面是没有nn的请求过来的（因为nn已经挂掉了）。

Gc日志：

2014-05-25T05:59:53.578 +0800:

2478553.316: [Full GC [PSYoungGen: 2784K->0K(4179328K)] [PSOldGen:

28566944K->9208608K(28573696K)] 28569728K->9208608K(32753024K)

[PSPermGen: 41405K->41405K(524288K)], 95.6706740 secs] [Times: user=95.62 sys=0.00, real= 95.67 secs]

Heap

PSYoungGen total 4179328K, used 87037K [0x00007f70e5000000, 0x00007f71e5000000, 0x00007f71e5000000)

eden space 4164800K, 2% used [0x00007f70e5000000,0x00007f70ea4ff590,0x00007f71e3330000)

from space 14528K, 0% used [0x00007f71e3330000,0x00007f71e3330000,0x00007f71e4160000)

to space 14336K, 0% used [0x00007f71e4200000,0x00007f71e4200000,0x00007f71e5000000)

PSOldGen total 28573696K, used 9208608K [0x00007f6a15000000, 0x00007f70e5000000, 0x00007f70e5000000)

object space 28573696K, 32% used [0x00007f6a15000000,0x00007f6c470c8078,0x00007f70e5000000)

PSPermGen total 524288K, used 41466K [0x00007f69f5000000, 0x00007f6a15000000, 0x00007f6a15000000)

object space 524288K, 7% used [0x00007f69f5000000,0x00007f69f787e8f8,0x00007f6a15000000)

jn日志：

2014-05-25 05:58:45,432 INFO

org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing

edits file

/home/vipshop/hard_disk/nn_shared_disk/bipcluster/current/edits_inprogress_0000000001665741687

/home/vipshop/hard_disk/nn_shared_disk/bipcluster/current/edits_0000000001665741687-0000000001665777062

2014-05-25 06:44:54,299 ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNode: RECEIVED SIGNAL 15: SIGTERM

2014-05-25 06:44:54,302 INFO org.apache.hadoop.hdfs.qjournal.server.JournalNode: SHUTDOWN_MSG:

nn错误日志：

2014-05-25 06:01:29,258 WARN

org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited

96275 ms (timeout=20000 ms) for a response for sendEdits. No responses

yet.

2014-05-25 06:01:29,259 WARN

org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host

machine (eg GC): pause of approximately 96145ms

GC pool 'PS MarkSweep' had collection(s): count=1 time=95670ms

GC pool 'PS Scavenge' had collection(s): count=1 time=592ms

2014-05-25

06:01:29,259 WARN

org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took

96281ms to send a batch of 2 edits (358 bytes) to remote journal

xxx:8485

2014-05-25 06:01:29,259 WARN

org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took

96283ms to send a batch of 2 edits (358 bytes) to remote journal

xxx:8485

2014-05-25 06:01:29,259 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog:

Number of transactions: 20019

Total time for transactions(ms): 436

Number of transactions batched in Syncs: 9441

Number of syncs: 5160

SyncTimes(ms): 41623 9305

2014-05-25

06:01:29,259 WARN

org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took

96275ms to send a batch of 13 edits (1638 bytes) to remote journal

xxx:8485

2014-05-25 06:01:29,259 FATAL

org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed

for required journal (JournalAndStream(mgr=QJM to [xxx:8485, xxx:8485,

xxx:8485, xxx:8485, xxx:8485], stream=QuorumOutputStream starting at

txid 1665777063))

java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.

at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)

at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)

at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)

at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)

at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)

at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)

at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:53)

at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)

at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1879)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1845)

at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:439)

org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:207)

org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44942)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

2014-05-25

06:01:29,259 WARN

org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took

96276ms to send a batch of 13 edits (1638 bytes) to remote journal

10.201.202.21:8485

2014-05-25 06:01:29,263 INFO

org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock:

/tmp/hive-hdfs/hive_2014-05-25_05-59-16_296_5866222407565920996/_task_tmp.-ext-10002/_tmp.001379_0.

BP-180494678-xxx-1366627257763

blk_-3717787322078480343_437088977{blockUCState=UNDER_CONSTRUCTION,

primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[xxx:50010|RBW],

ReplicaUnderConstruction[xxx:50010|RBW],

ReplicaUnderConstruction[xxx:50010|RBW]]}

2014-05-25 06:01:29,263

WARN

org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault:

Not able to place enough replicas, still in need of 2 to reach 3

For more information, please enable DEBUG log level on org.apache.commons.logging.impl.Log4JLogger

2014-05-25

06:01:29,263 WARN

org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting

QuorumOutputStream starting at txid 1665777063

......

2014-05-25

06:01:29,268 WARN

org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took

96285ms to send a batch of 13 edits (1638 bytes) to remote journal

xxx:8485

......

2014-05-25 06:01:29,324 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1

2014-05-25 06:01:29,364 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

在写journalnode超时时，触发了 ExitUtil类的terminate 方法，终止当前的进程：

JournalSet类中：

for (JournalAndStream jas : journals) {

try {

closure.apply(jas);

} catch (Throwable t) {

if (jas.isRequired()) {

final String msg = "Error: " + status + " failed for required journal ("

+ jas + ")" ;

LOG .fatal(msg, t);

// If we fail on *any* of the required journals, then we must not

// continue on any of the other journals. Abort them to ensure that

// retry behavior doesn't allow them to keep going in any way.

abortAllJournals();

// the current policy is to shutdown the NN on errors to shared edits

// dir . There are many code paths to shared edits failures - syncs ,

// roll of edits etc. All of them go through this common function

// where the isRequired() check is made. Applying exit policy here

// to catch all code paths.

terminate(1, msg);

} else {

LOG .error("Error: " + status + " failed for (journal " + jas + ")" , t);

badJAS.add(jas);

}

ExitUtil类的terminate方法，调用了System.exit方法：

/**

* Terminate the current process. Note that terminate is the *only* method

* that should be used to terminate the daemon processes.

* @param status exit code

* @param msg message used to create the ExitException

* @throws ExitException if System.exit is disabled for test purposes

public static void terminate(int status, String msg) throws ExitException {

LOG.info( "Exiting with status " + status);

if (systemExitDisabled) {

ExitException ee = new ExitException(status, msg);

LOG.fatal( "Terminate called", ee);

if (null == firstExitException) {

firstExitException = ee;

}

throw ee;

}

System.exit(status);

}

处理方法：

重启集群，在 6:50左右恢复正常

后续解决方法：

1）调节journalnode 的写入超时时间

dfs.qjournal.write-txns.timeout.ms

2）调整namenode 的java参数，提前触发 full gc，这样full gc 的时间就会小一些。

3）默认namenode的fullgc方式是parallel gc，是stw模式的，更改为cms的格式。调整namenode的启动参数：

-XX:+UseCompressedOops

-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled

-XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0

-XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC

-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=75

-XX:SoftRefLRUPolicyMSPerMB=0

另外一个比较坑的是flume不会自动恢复，需要重启。。

否则会一直报java.lang.InterruptedException: sleep interrupted 错误。

出处http://caiguangguang.blog.51cto.com/1652935/1418895

路过

雷人

握手

鲜花

阿飞的个人空间 https://www.aboutyun.com/?3890 [收藏] [复制] [分享] [RSS]

日志

namenode gc导致的故障及解决办法记录

刚表态过的朋友 (2 人)

全部作者的其他最新日志

评论 (0 个评论)

阿飞

推荐 /2