分享

通过tarball形式安装HBASE Cluster(CDH5.0.2)——Hadoop NameNode HA 切换引起的...

pig2 发表于 2015-2-3 18:21:04 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 6 37348
通过tarball形式安装HBASE Cluster(CDH5.0.2)——Hadoop NameNode HA 切换引起的Hbase错误,以及Hbase如何基于NameNode的HA进行配置记录


通过tarball形式安装HBASE Cluster(CDH5.0.2)——Hadoop NameNode HA 切换引起的Hbase错误,以及Hbase如何基于NameNode的HA进行配置
配置HBASE的时候一开始按照cdh网站上的说明,hbase.rootdir的值设置使用的是基于Hadoop Namenode HA的nameservice
  1. <property>
  2.       <name>hbase.rootdir</name>
  3.       <value>hdfs://hbasecluster/hbase</value>
  4. </property>
复制代码

配置后,使用start-hbase.sh启动,只有位于bakup-masters文件中的HMaster能启动,hbase配置的位于Hadoop DateNode上的RegionServer都启动失败,所报错误为连接hdfs:/hbasecluster:8020失败(),当时网上搜索未果,只好改成单机模式:
  1. <property>
  2.       <name>hbase.rootdir</name>
  3.       <value>hdfs://nn1:8020/hbase</value>
  4. </property>
复制代码

此时,hbase可以启动成功,hbase shell测试,远程java程序测试皆通过。逐渐地我就忘记这个配置遗留问题了,结果昨天心血来潮又测试了一下Namenode的HA切换,这个倒是成功了没有问题。可是在测试java程序访问hbase发现失败了,到服务器上看主节点zk1上的HMaster进程没有了,zk2,zk3上的HMaster进程虽然有也不能链接,于zk1上重新启动hbase集群
  1. stop-hbase.sh
  2. start-hbase.sh
  3. jps
复制代码

随后用jps查看,发现HMaster进程很快消失,查看log发现,报如下错误:

  1. master.HMaster: Unhandled exception. Starting shutdown.
  2. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
  3.     at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
  4.     at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1565)
  5.     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1183)
  6.     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3492)
  7.     at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:764)
  8.     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:764)
  9.     at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
  10.     at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
  11.     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
  12.     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
  13.     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
  14.     at java.security.AccessController.doPrivileged(Native Method)
  15.     at javax.security.auth.Subject.doAs(Subject.java:415)
  16.     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
  17.     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
  18.     at org.apache.hadoop.ipc.Client.call(Client.java:1409)
  19.     at org.apache.hadoop.ipc.Client.call(Client.java:1362)
  20.     at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
  21.     at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
  22.     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  23.     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  24.     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  25.     at java.lang.reflect.Method.invoke(Method.java:606)
  26.     at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
  27.     at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  28.     at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
  29.     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:699)
  30.     at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1757)
  31.     at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124)
  32.     at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
  33.     at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  34.     at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
  35.     at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
  36.     at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:438)
  37.     at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:146)
  38.     at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:127)
  39.     at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:789)
  40.     at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:606)
  41.     at java.lang.Thread.run(Thread.java:745)
  42. 2014-07-24 08:51:47,025 INFO  [master:zk1:60000] master.HMaster: Aborting
  43. 2014-07-24 08:51:47,026 DEBUG [master:zk1:60000] master.HMaster: Stopping service threads
复制代码
仔细看,发现是namenode报的错,检查nn1状态,发现此时nn1处于standby状态,nn2已经处于active状态,突然想起昨天的测试。这激起了我解决这个问题的决心,经过一翻goooooooooooooooogle,终于在hbase官网上的hbase文档中发现了如下几行内容:
  1. 2.2.2.2.3. HDFS Client Configuration
  2. Of note, if you have made HDFS client configuration on your Hadoop cluster -- i.e. configuration you want HDFS clients to use as opposed to server-side configurations -- HBase will not see this configuration unless you do one of the following:
  3.       Add a pointer to your HADOOP_CONF_DIR to the HBASE_CLASSPATH environment variable in hbase-env.sh.
  4.       Add a copy of hdfs-site.xml (or hadoop-site.xml) or, better, symlinks, under ${HBASE_HOME}/conf, or
  5.       if only a small set of HDFS client configurations, add them to hbase-site.xml.
  6. An example of such an HDFS client configuration is dfs.replication. If for example, you want to run with a replication factor of 5, hbase will create files with the default of 3 unless you do the above to make the configuration available to HBase.
复制代码

我才用的是方法二,在所有的master节点和regionserver节点上都建立了hdfs-site.xml的符号链接(scp 分发符号连接不行,都变成真实文件了),hbase.rootdir配置为nameservice方式,再次测试hbase集群启动,成功!

然后测试hadoop namenode切换,hbase没问题了,至此困扰我多日的问题终于解决了。






已有(6)人评论

跳转到指定楼层
YLV 发表于 2015-3-11 15:02:35
回复

使用道具 举报

supertianxiang 发表于 2016-1-12 10:37:07
楼主,你好,我也是遇到你这同样的问题,hadoop2.4.1,hbase0.98 ,按你的这个办法没有解决呀,
  • <property>
  • <name>hbase.rootdir</name>
  • <value>hdfs://hbasecluster/hbase</value>   ///需要给hbasecluster在hosts里面整个IP么,我的一启动HBASE就提示hbasecluster不认识,配置了HOSTS以后,集群可以用,但是引入一个新问题就是你描述的这个。
  • </property>
回复

使用道具 举报

xuanxufeng 发表于 2016-1-12 10:53:03
supertianxiang 发表于 2016-1-12 10:37
楼主,你好,我也是遇到你这同样的问题,hadoop2.4.1,hbase0.98 ,按你的这个办法没有解决呀,
  • hb ...

  • hbasecluster肯定不需要的,这是HA里面的抽象
    回复

    使用道具 举报

    supertianxiang 发表于 2016-1-12 11:05:45
    xuanxufeng 发表于 2016-1-12 10:53
    hbasecluster肯定不需要的,这是HA里面的抽象

    嗯,我查了不少资料,都是这说的这样的,可是不配的话,会有下面的错误

    161393 2016-01-12 10:39:24,879 INFO  [main] util.ServerCommandLine: env:HBASE_THRIFT_OPTS=-Xms2g -Xmx2g
    161394 2016-01-12 10:39:24,880 INFO  [main] util.ServerCommandLine: env:QTINC=/usr/lib64/qt-3.3/include
    161395 2016-01-12 10:39:24,880 INFO  [main] util.ServerCommandLine: env:USER=hadoop
    161396 2016-01-12 10:39:24,880 INFO  [main] util.ServerCommandLine: env:HBASE_CLASSPATH= /export/distributed/hadoop/hadoop-2.4.1/etc
    161397 2016-01-12 10:39:24,880 INFO  [main] util.ServerCommandLine: env:HOME=/home/hadoop
    161398 2016-01-12 10:39:24,880 INFO  [main] util.ServerCommandLine: env:HISTCONTROL=ignoredups
    161399 2016-01-12 10:39:24,880 INFO  [main] util.ServerCommandLine: env:LESSOPEN=|/usr/bin/lesspipe.sh %s
    161400 2016-01-12 10:39:24,880 INFO  [main] util.ServerCommandLine: env:LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;3
           3;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=0
           1;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz=01;31:*.tbz2=0
           1;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jp
           eg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01
           ;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35
           :*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01
           ;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.
           mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
    161401 2016-01-12 10:39:24,881 INFO  [main] util.ServerCommandLine: env:HBASE_LOG_PREFIX=hbase-hadoop-master-M-172-16-73-194
    161402 2016-01-12 10:39:24,881 INFO  [main] util.ServerCommandLine: env:LANG=zh_CN.UTF-8
    161403 2016-01-12 10:39:24,881 INFO  [main] util.ServerCommandLine: env:HBASE_IDENT_STRING=hadoop
    161404 2016-01-12 10:39:24,883 INFO  [main] util.ServerCommandLine: vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Oracle Corporation, vmVersion=24.65-b0
           4
    161405 2016-01-12 10:39:24,884 INFO  [main] util.ServerCommandLine: vmInputArguments=[-Dproc_master, -XX:OnOutOfMemoryError=kill -9 %p, -Xmx4192m, -XX:+HeapD
           umpOnOutOfMemoryError, -XX:+UseConcMarkSweepGC, -XX:+CMSParallelRemarkEnabled, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+UseParNewGC, -Xmn1024m, -XX:CM
           SInitiatingOccupancyFraction=70, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC, -Dhbase.log.dir=/export/distributed/hbase/hbase-0.98.9-hadoop2/bin/../logs
           , -Dhbase.log.file=hbase-hadoop-master-M-172-16-73-194.log, -Dhbase.home.dir=/export/distributed/hbase/hbase-0.98.9-hadoop2/bin/.., -Dhbase.id.str=had
           oop, -Dhbase.root.logger=INFO,RFA, -Djava.library.path=/export/distributed/hadoop/hadoop-2.4.1/lib/native, -Dhbase.security.logger=INFO,RFAS]
    161406 2016-01-12 10:39:25,051 DEBUG [main] master.HMaster: master/M-172-16-73-194/172.16.73.194:60000 HConnection server-to-server retries=350
    161407 2016-01-12 10:39:25,374 INFO  [main] ipc.RpcServer: master/M-172-16-73-194/172.16.73.194:60000: started 10 reader(s).
    161408 2016-01-12 10:39:25,525 INFO  [main] impl.MetricsConfig: loaded properties from hadoop-metrics2-hbase.properties
    161409 2016-01-12 10:39:25,664 INFO  [main] impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
    161410 2016-01-12 10:39:25,664 INFO  [main] impl.MetricsSystemImpl: HBase metrics system started
    161411 2016-01-12 10:39:26,548 ERROR [main] master.HMasterCommandLine: Master exiting
    161412 java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster
    161413     at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3017)
    161414     at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:186)
    161415     at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:135)
    161416     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    161417     at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
    161418     at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3031)
    161419 Caused by: java.net.UnknownHostException: HADOOPCLUSTER1----------这个是我集群的名称,不配的话,就提示这个错误起不了
    161420     at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
    161421     at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:240)
    161422     at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:144)
    161423     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:579)
    161424     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:524)
    161425     at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146)
    161426     at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
    161427     at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
    161428     at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
    161429     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
    161430     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
    161431     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
    161432     at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:927)
    161433     at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:533)
    161434     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    161435     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    161436     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    161437     at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    161438     at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3012)
    回复

    使用道具 举报

    xuanxufeng 发表于 2016-1-12 11:15:34
    supertianxiang 发表于 2016-1-12 11:05
    嗯,我查了不少资料,都是这说的这样的,可是不配的话,会有下面的错误

    161393 2016-01-12 10:39:24,8 ...

    问题不在这,看看其它方面的配置
    回复

    使用道具 举报

    supertianxiang 发表于 2016-1-12 16:49:26
    xuanxufeng 发表于 2016-1-12 11:15
    问题不在这,看看其它方面的配置

    多谢,问题已经解决,是dfs.client.failover.proxy.provider配置的问题
    回复

    使用道具 举报

    您需要登录后才可以回帖 登录 | 立即注册

    本版积分规则

    关闭

    推荐上一条 /2 下一条