分享

DataNode无法启动,求大神帮忙

追随云科技 发表于 2016-5-25 20:57:01 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 1 15468
STARTUP_MSG:   build = http://github.com/cloudera/hadoop -r c00978c67b0d3fe9f3b896b5030741bd40bf541a; compiled by 'jenkins' on 2016-03-23T18:41Z
STARTUP_MSG:   java = 1.7.0_79
************************************************************/
2016-05-25 17:11:07,627 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT]
2016-05-25 17:11:09,744 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-05-25 17:11:10,761 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2016-05-25 17:11:10,961 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-05-25 17:11:10,962 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2016-05-25 17:11:10,991 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Initialized block scanner with targetBytesPerSec 1048576
2016-05-25 17:11:10,995 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: File descriptor passing is disabled because libhadoop cannot be loaded.
2016-05-25 17:11:10,999 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is hadoopdatanode1
2016-05-25 17:11:11,035 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting DataNode with maxLockedMemory = 0
2016-05-25 17:11:11,138 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at /0.0.0.0:50010
2016-05-25 17:11:11,146 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s
namenode  信息:
[root@hadoopnamenode sbin]# ./stop-dfs.sh
16/05/25 20:53:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [hadoopnamenode]
hadoopnamenode: stopping namenode
hadoopdatanode2: no datanode to stop
hadoopdatanode1: no datanode to stop
hadoopdatanode3: no datanode to stop
Stopping secondary namenodes [hadoop2ndnamenode]
hadoop2ndnamenode: stopping secondarynamenode
16/05/25 20:53:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable




datanode 信息:
2016-05-25 17:11:11,146 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads for balancing is 5
2016-05-25 17:11:11,150 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
java.lang.RuntimeException: Although a UNIX domain socket path is configured as /opt/data/hadoop/hdfs/dn._PORT, we cannot start a localDataXceiverServer because libhadoop cannot be loaded.
        at org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:962)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:933)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1137)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:451)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2406)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2293)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2340)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2517)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2541)
2016-05-25 17:11:11,160 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2016-05-25 17:11:11,165 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hadoopdatanode1/172.29.140.172
************************************************************/


已有(1)人评论

跳转到指定楼层
einhep 发表于 2016-5-26 06:19:38
第一个原因:
解决办法:
1)修改每个Slave的namespaceID,使其与Master的namespaceID一致。
或者
2)修改Master的namespaceID使其与Slave的namespaceID一致。
Master的“namespaceID”位于“/usr/hadoop/tmp/dfs/name/current/VERSION”文件里面,Slave的“namespaceID”位于“/usr/hadoop/tmp/dfs/data/current/VERSION”文件里面。


第二个原因:
问题的原因是hadoop在stop的时候依据的是datanode上的mapred和dfs进程号。而默认的进程号保存在/tmp下,linux 默认会每隔一段时间(一般是一个月或者7天左右)去删除这个目录下的文件。因此删掉hadoop-hadoop-jobtracker.pid和hadoop-hadoop-namenode.pid两个文件后,namenode自然就找不到datanode上的这两个进程了。
在配置文件hadoop_env.sh中配置export HADOOP_PID_DIR可以解决这个问题。
在配置文件中,HADOOP_PID_DIR的默认路径是“/var/hadoop/pids”,我们手动在“/var”目录下创建一个“hadoop”文件夹,若已存在就不用创建,记得用chown将权限分配给hadoop用户。然后在出错的Slave上杀死Datanode和Tasktracker的进程(kill -9 进程号),再重新start-all.sh,stop-all.sh时发现没有“no datanode to stop”出现,说明问题已经解决。



回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条