分享

hbase的Not running balancer原因之一:时间不一致

nettman 2014-1-24 16:06:41 发表于 总结型 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 0 12453
在master的日志如下问题:

2013-11-19 06:24:35,134 DEBUG [h1,60000,1384804461419-BalancerChore] balancer.BaseLoadBalancer: Not running balancer because only 1 active regionserver(s)
2013-11-19 06:29:35,451 DEBUG [h1,60000,1384804461419-BalancerChore] balancer.BaseLoadBalancer: Not running balancer because only 1 active regionserver(s)
2013-11-19 06:34:35,578 DEBUG [h1,60000,1384804461419-BalancerChore] balancer.BaseLoadBalancer: Not running balancer because only 1 active regionserver(s)
2013-11-19 06:39:35,697 DEBUG [h1,60000,1384804461419-BalancerChore] balancer.BaseLoadBalancer: Not running balancer because only 1 active regionserver(s)

通过master日志发现只有一个regionserver工作,我的测试环境是配置了两个regionserver

于是分别查看两台regionsever的log
果然,其中的一台

2013-11-19 08:09:34,900 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting
java.lang.RuntimeException: HRegionServer Aborted
        at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:66)
        at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:85)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2311)
2013-11-19 08:09:34,901 INFO  [Shutdownhook:regionserver60020] regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread-4,5,main]
2013-11-19 08:09:34,901 INFO  [Shutdownhook:regionserver60020] regionserver.HRegionServer: STOPPED: Shutdown hook
2013-11-19 08:09:34,901 INFO  [Shutdownhook:regionserver60020] regionserver.ShutdownHook: Starting fs shutdown hook thread.
2013-11-19 08:09:34,906 INFO  [Shutdownhook:regionserver60020] regionserver.ShutdownHook: Shutdown hook finished.

尝试过重启hbase集群,跟踪日志发现这台regionserver无法启动

详细查看regionserver日志发现如下内容:


2013-11-19 08:09:34,765 FATAL [regionserver60020] regionserver.HRegionServer: Master rejected startup because clock is out of sync
org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server h2,60020,1384819773571 has been rejected; Reported time is too far out of sync with master.  Time difference of 32504ms > max allowed of 30000ms
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
        at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:235)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1926)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:790)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server h2,60020,1384819773571 has been rejected; Reported time is too far out of sync with master.  Time difference of 32504ms > max allowed of 30000ms
        at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:314)
        at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:215)
        at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1292)
        at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5085)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
        at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)

        at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1446)
        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
        at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
        at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:5402)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1924)
        ... 2 more


Time difference of 32504ms > max allowed of 30000ms
让我想到了集群服务器的操作系统时间,重新同步了集群中服务器的时间后regionserver可以正常启动

由此可见:hbase集群的时间差距不能超过30m,而我的集群环境没用配置时间服务器造成时间差引起的这次故障。
加微信w3aboutyun,可拉入技术爱好者群

没找到任何评论,期待你打破沉寂

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条