分享

hadoop HA namende不能自动切换

tustyao 发表于 2015-3-25 13:02:27 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 13 110400
当两个namenode都启动de时候,其中一个standby状态的NN会切换ACTIVE 如手动killl掉active的NN时候,standby的NN不能切换成ACTIVE状态

日志:
2015-03-25 12:41:33,841 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from master port 222015-03-25 12:41:33,841 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to connect to master as user rootcom.jcraft.jsch.JSchException: Auth fail        at com.jcraft.jsch.Session.connect(Session.java:452)        at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)        at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)2015-03-25 12:41:33,842 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.2015-03-25 12:41:33,842 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.2015-03-25 12:41:33,843 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of electionjava.lang.RuntimeException: Unable to fence NameNode at master/192.168.11.128:9000        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)2015-03-25 12:41:33,843 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session2015-03-25 12:41:33,855 INFO org.apache.zookeeper.ZooKeeper: Session: 0x34c4f2fe3a60003 closed2015-03-25 12:41:34,857 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,node01:2181,node02:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@2d098f1f2015-03-25 12:41:34,862 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server node01/192.168.11.129:2181. Will not attempt to authenticate using SASL (unknown error)2015-03-25 12:41:34,863 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to node01/192.168.11.129:2181, initiating session2015-03-25 12:41:34,868 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server node01/192.168.11.129:2181, sessionid = 0x24c4f2fad420002, negotiated timeout = 50002015-03-25 12:41:34,870 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down2015-03-25 12:41:34,874 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.2015-03-25 12:41:34,877 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...2015-03-25 12:41:34,880 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a07636c757374657212036e6e311a066d617374657220a84628d33e2015-03-25 12:41:34,883 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at master/192.168.11.128:90002015-03-25 12:41:35,892 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.11.128:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)2015-03-25 12:41:35,893 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at master/192.168.11.128:9000 standby (unable to connect)java.net.ConnectException: Call From node01/192.168.11.129 to master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)        at org.apache.hadoop.ipc.Client.call(Client.java:1351)        at org.apache.hadoop.ipc.Client.call(Client.java:1300)        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)        at com.sun.proxy.$Proxy8.transitionToStandby(Unknown Source)        at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTranslatorPB.java:112)        at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:172)        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:503)        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)Caused by: java.net.ConnectException: Connection refused        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)        at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)        at org.apache.hadoop.ipc.Client.call(Client.java:1318)        ... 14 more2015-03-25 12:41:35,894 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ======2015-03-25 12:41:35,894 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)2015-03-25 12:41:35,894 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to master...2015-03-25 12:41:35,894 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to master port 222015-03-25 12:41:35,895 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established2015-03-25 12:41:35,903 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string: SSH-2.0-OpenSSH_6.42015-03-25 12:41:35,903 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string: SSH-2.0-JSCH-0.1.422015-03-25 12:41:35,903 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers: aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour2562015-03-25 12:41:35,907 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.2015-03-25 12:41:35,907 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.2015-03-25 12:41:35,907 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.2015-03-25 12:41:35,907 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.2015-03-25 12:41:35,907 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.2015-03-25 12:41:35,908 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent2015-03-25 12:41:35,908 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received2015-03-25 12:41:35,908 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr hmac-md5 none2015-03-25 12:41:35,908 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr hmac-md5 none2015-03-25 12:41:35,911 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent2015-03-25 12:41:35,911 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY2015-03-25 12:41:35,917 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true2015-03-25 12:41:35,918 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'master' (RSA) to the list of known hosts.2015-03-25 12:41:35,918 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent2015-03-25 12:41:35,918 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received2015-03-25 12:41:35,920 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent2015-03-25 12:41:35,920 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received2015-03-25 12:41:35,923 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: gssapi-with-mic,publickey,keyboard-interactive,password2015-03-25 12:41:35,923 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: gssapi-with-mic2015-03-25 12:41:35,929 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: publickey,keyboard-interactive,password2015-03-25 12:41:35,929 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: publickey2015-03-25 12:41:35,929 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: password2015-03-25 12:41:35,929 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: password2015-03-25 12:41:35,930 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from master port 222015-03-25 12:41:35,930 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable to connect to master as user rootcom.jcraft.jsch.JSchException: Auth fail        at com.jcraft.jsch.Session.connect(Session.java:452)        at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)        at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)2015-03-25 12:41:35,931 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.2015-03-25 12:41:35,931 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.2015-03-25 12:41:35,931 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of electionjava.lang.RuntimeException: Unable to fence NameNode at master/192.168.11.128:9000        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)2015-03-25 12:41:35,932 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session2015-03-25 12:41:35,942 INFO org.apache.zookeeper.ZooKeeper: Session: 0x24c4f2fad420002 closed2015-03-25 12:41:36,944 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,node01:2181,node02:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@201cc1812015-03-25 12:41:36,945 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server node01/192.168.11.129:2181. Will not attempt to authenticate using SASL (unknown error)2015-03-25 12:41:36,946 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to node01/192.168.11.129:2181, initiating session2015-03-25 12:41:36,979 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server node01/192.168.11.129:2181, sessionid = 0x24c4f2fad420003, negotiated timeout = 50002015-03-25 12:41:36,980 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down2015-03-25 12:41:36,985 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.2015-03-25 12:41:36,987 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...2015-03-25 12:41:36,990 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a07636c757374657212036e6e311a066d617374657220a84628d33e2015-03-25 12:41:36,991 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at master/192.168.11.128:90002015-03-25 12:41:38,005 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.11.128:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)2015-03-25 12:41:38,006 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at master/192.168.11.128:9000 standby (unable to connect)java.net.ConnectException: Call From node01/192.168.11.129 to master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

已有(12)人评论

跳转到指定楼层
yuwenge 发表于 2015-3-25 18:02:55
验证下无密码登录
回复

使用道具 举报

tustyao 发表于 2015-3-25 23:19:10
Unable to connect to master as user rootcom.jcraft.jsch.JSchException: Auth fail 看着我也觉得是无密码登陆问题,可是 可是无密码节点都可以彼此登陆。我也不知道这到底拿错了!
回复

使用道具 举报

desehawk 发表于 2015-3-26 00:28:14
tustyao 发表于 2015-3-25 23:19
Unable to connect to master as user rootcom.jcraft.jsch.JSchException: Auth fail 看着我也觉得是无密 ...


用的哪个账户进行的无密码登录。
是root,还是普通账户。

回复

使用道具 举报

韩克拉玛寒 发表于 2015-3-26 09:14:39
首先验证机器之间免密码登陆,第二就是你用的是什么用户,导致鉴权失败?
回复

使用道具 举报

tustyao 发表于 2015-3-26 09:40:29
首先感谢各位热情帮忙,我用的root用户,之前用普通用户登陆,然后切换root,做的格式化,启动等操作,后来我发现生成日志名字不对,我就注销用root重新登陆,格式化以及生成密钥。使用root彼此之间可以免密码登陆,有没有可能是节点少的缘故,我就做了3个节点,全部都是datanode和journalnode,两个做NN
回复

使用道具 举报

langke93 发表于 2015-3-26 10:51:45
tustyao 发表于 2015-3-26 09:40
首先感谢各位热情帮忙,我用的root用户,之前用普通用户登陆,然后切换root,做的格式化,启动等操作,后来 ...

root ssh  确认下,是否允许root ssh远程登录
回复

使用道具 举报

awenkidz 发表于 2015-4-15 13:29:08
我用3个journalnode共享2个NN元数据,可以的
回复

使用道具 举报

bigye 发表于 2015-5-23 23:18:19
我今天也遇到了同样的问题,但可以排除不是和楼主的原因相同,报错如下:
hadoop@suse02:~>hdfs haadmin -transitionToActive nn2
INFO ipc.Client:Retring connect to server :suse01/172.20.32.41:8020. Already tried 0 time(s);retry ...
Unexpected error occurred Call From suse02/172.20.32.42 to suse01:8020 failed on connection exception:
java.net.ConnectException:Connection refused;For more details see:....
求教~~
回复

使用道具 举报

aLivable 发表于 2017-12-1 23:28:58
LZ  求助!!!
回复

使用道具 举报

12下一页
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条