分享

hadoop升级遇到问题汇总

pig2 2017-11-17 15:15:53 发表于 总结型 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 0 6547
本帖最后由 pig2 于 2017-11-20 18:45 编辑


hadoop升级跟spark升级是大同小异的。大同基本上,都是先备份原先的安装包,然后替换为新的安装包。然后替换下配置文件及环境变量。小异,比如有个别配置项可能会被弃用或则个别配置项被改名等等。
上篇写了
spark1.x升级spark2如何升级及需要考虑的问题
http://www.aboutyun.com/forum.php?mod=viewthread&tid=23315
这里对于hadoop如何升级不在详述,
可参考
Hadoop1.x集群升级到Hadoop2.x指导及需要注意的问题
http://www.aboutyun.com/forum.php?mod=viewthread&tid=6551
更多可搜索。

环境:hadoop2.6.5升级hadoop2.7.4
这里主要说下遇到的问题:

问题1
hadoop升级后在启动的时候,需要执行upgrade
[mw_shl_code=bash,true]2017-11-16 17:39:34,398 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
java.io.IOException:
File system image contains an old layout version -60.
An upgrade to version -63 is required.
Please restart NameNode with the "-rollingUpgrade started" option if a rolling upgrade is already started; or restart NameNode with the "-upgrade" option to start a new upgrade.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:263)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:978)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:685)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:819)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:803)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1500)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1566)
2017-11-16 17:39:34,443 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50070
2017-11-16 17:39:34,457 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2017-11-16 17:39:34,457 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2017-11-16 17:39:34,458 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2017-11-16 17:39:34,458 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.IOException:
File system image contains an old layout version -60.
An upgrade to version -63 is required.
Please restart NameNode with the "-rollingUpgrade started" option if a rolling upgrade is already started; or restart NameNode with the "-upgrade" option to start a new upgrade.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:263)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:978)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:685)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:819)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:803)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1500)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1566)
2017-11-16 17:39:34,459 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2017-11-16 17:39:34,463 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.1.10
************************************************************/[/mw_shl_code]


解决办法:
[mw_shl_code=bash,true] start-dfs.sh -upgrade
[/mw_shl_code]

问题2
启动yarn解决的问题:
[mw_shl_code=bash,true]2017-11-16 17:47:00,474 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized CapacityScheduler with calculator=class org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, minimumAllocation=<<memory:1024, vCores:1>>, maximumAllocation=<<memory:8192, vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms
2017-11-16 17:47:00,475 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [master:8031] java.net.BindException: 地址已在使用; For more details see:  http://wiki.apache.org/hadoop/BindException
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [master:8031] java.net.BindException: 地址已在使用; For more details see:  http://wiki.apache.org/hadoop/BindException
        at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
        at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
        at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.serviceStart(ResourceTrackerService.java:163)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:584)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:972)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1013)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1009)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1009)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1049)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1186)
Caused by: java.net.BindException: Problem binding to [master:8031] java.net.BindException: 地址已在使用; For more details see:  http://wiki.apache.org/hadoop/BindException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
        at org.apache.hadoop.ipc.Server.bind(Server.java:484)
        at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:690)
        at org.apache.hadoop.ipc.Server.<init>(Server.java:2379)
        at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:951)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:534)
        at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:509)
        at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:796)
        at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:173)
        at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132)
        ... 17 more
Caused by: java.net.BindException: 地址已在使用
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:433)
        at sun.nio.ch.Net.bind(Net.java:425)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        at org.apache.hadoop.ipc.Server.bind(Server.java:467)
        ... 25 more
2017-11-16 17:47:00,488 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state
2017-11-16 17:47:00,488 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to standby state
2017-11-16 17:47:00,488 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [master:8031] java.net.BindException: 地址已在使用; For more details see:  http://wiki.apache.org/hadoop/BindException
        at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
        at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
        at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.serviceStart(ResourceTrackerService.java:163)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:584)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:972)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1013)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1009)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1009)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1049)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1186)
Caused by: java.net.BindException: Problem binding to [master:8031] java.net.BindException: 地址已在使用; For more details see:  http://wiki.apache.org/hadoop/BindException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
        at org.apache.hadoop.ipc.Server.bind(Server.java:484)
        at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:690)
        at org.apache.hadoop.ipc.Server.<init>(Server.java:2379)
        at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:951)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:534)
        at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:509)
        at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:796)
        at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:173)
        at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132)
        ... 17 more
Caused by: java.net.BindException: 地址已在使用
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:433)
        at sun.nio.ch.Net.bind(Net.java:425)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        at org.apache.hadoop.ipc.Server.bind(Server.java:467)
        ... 25 more
2017-11-16 17:47:00,493 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down ResourceManager at master/192.168.1.10[/mw_shl_code]


查看端口:
[mw_shl_code=bash,true]netstat -anp|grep 8031
tcp6       0      0 192.168.1.10:8031       :::*                    LISTEN      24571/java         
tcp6       0      0 192.168.1.10:8031       192.168.1.30:55982      ESTABLISHED 24571/java         
tcp6       0      0 192.168.1.10:8031       192.168.1.20:44126      ESTABLISHED 24571/java   [/mw_shl_code]

解决办法:
[mw_shl_code=bash,true] kill -9 24571
[/mw_shl_code]

问题3
nodemanager起不来:
[mw_shl_code=bash,true]2017-11-16 17:51:07,956 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [0.0.0.0:8040] java.net.BindException: 地址已在使用; For more details see:  http://wiki.apache.org/hadoop/BindException
        at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
        at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
        at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.createServer(ResourceLocalizationService.java:359)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.serviceStart(ResourceLocalizationService.java:337)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStart(ContainerManagerImpl.java:457)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:272)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:496)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:543)
Caused by: java.net.BindException: Problem binding to [0.0.0.0:8040] java.net.BindException: 地址已在使用; For more details see:  http://wiki.apache.org/hadoop/BindException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
        at org.apache.hadoop.ipc.Server.bind(Server.java:484)
        at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:690)
        at org.apache.hadoop.ipc.Server.<init>(Server.java:2379)
        at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:951)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:534)
        at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:509)
        at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:796)
        at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:173)
        at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132)
        ... 13 more
Caused by: java.net.BindException: 地址已在使用
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:433)
        at sun.nio.ch.Net.bind(Net.java:425)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        at org.apache.hadoop.ipc.Server.bind(Server.java:467)
        ... 21 more
2017-11-16 17:51:07,963 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at slave2/192.168.1.30[/mw_shl_code]

同样的问题,同样的解决办法:
[mw_shl_code=bash,true]sudo netstat -anp|grep 8040
kill -9 20364[/mw_shl_code]





没找到任何评论,期待你打破沉寂

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条