分享

hadoop2.7 运行wordcount

问题导读

1.Call From: ubuntu to localhost:9000 failed on connection exception: java.net.ConnectException: Connection本文是如何解决的?
2.运行wordcount,本文做了那些准备工作?
3.如何查看运行结果?





接上篇
hadoop2.7【单节点】单机、伪分布、分布式安装指导

  1. hdfs dfs -mkdir /user
复制代码


  1. hdfs dfs -mkdir /user/aboutyun
复制代码


1.png

  1. hdfs dfs -put etc/hadoop input
复制代码
#####################################

这条命令的执行需要注意路径:
hdfs dfs -put etc/hadoop input
执行路径为hadoop_home我这里是~/hadoop-2.7.0
2.png

在执行这条命令的时候遇到错误:
  1. put: File /user/aboutyun/input/yarn-env.sh._COPYING_ could only be replicated to
  2. 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no
  3. node(s) are excluded in this operation.
  4. 15/04/27 08:16:30 WARN hdfs.DFSClient: DataStreamer Exception
  5. org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
  6. /user/aboutyun/input/yarn-site.xml._COPYING_ could only be replicated to 0 nodes
  7. instead of minReplication (=1).  There are 0 datanode(s) running and no node(s)
  8. are excluded in this operation.
  9.         at
  10. org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock
  11. (BlockManager.java:1550)
  12.         at
  13. org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock
  14. (FSNamesystem.java:3067)
  15.         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock
  16. (NameNodeRpcServer.java:722)
  17.         at
  18. org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.a
  19. ddBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
  20.         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos
  21. $ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
  22.         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server
  23. $ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
  24.         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
  25.         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
  26.         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
  27.         at java.security.AccessController.doPrivileged(Native Method)
  28.         at javax.security.auth.Subject.doAs(Subject.java:415)
  29.         at org.apache.hadoop.security.UserGroupInformation.doAs
  30. (UserGroupInformation.java:1657)
  31.         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
  32.         at org.apache.hadoop.ipc.Client.call(Client.java:1476)
  33.         at org.apache.hadoop.ipc.Client.call(Client.java:1407)
  34.         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke
  35. (ProtobufRpcEngine.java:229)
  36.         at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
  37.         at
  38. org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock
  39. (ClientNamenodeProtocolTranslatorPB.java:418)
  40.         at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
  41.         at sun.reflect.DelegatingMethodAccessorImpl.invoke
  42. (DelegatingMethodAccessorImpl.java:43)
  43.         at java.lang.reflect.Method.invoke(Method.java:606)
  44.         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod
  45. (RetryInvocationHandler.java:187)
  46.         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke
  47. (RetryInvocationHandler.java:102)
  48.         at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
  49.         at org.apache.hadoop.hdfs.DFSOutputStream
  50. $DataStreamer.locateFollowingBlock(DFSOutputStream.java:1430)
  51.         at org.apache.hadoop.hdfs.DFSOutputStream
  52. $DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1226)
  53.         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run
  54. (DFSOutputStream.java:449)
  55. put: File /user/aboutyun/input/yarn-site.xml._COPYING_ could only be replicated
  56. to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and
  57. no node(s) are excluded in this operation.
复制代码
通过jps查看进程都在,但是上面却报错,于是重启。
在重启的过程中,发现no datanode to stop。
看来datanode成为了僵死的进程。

copt.png

再次启动
  1. start-dfs.sh
复制代码


还是没有成功,再次查看日志
2015-04-27 08:28:05,274 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock
on /tmp/hadoop-aboutyun/dfs/data/in_use.lock acquired by nodename 13969@ubuntu
2015-04-27 08:28:05,278 WARN org.apache.hadoop.hdfs.server.common.Storage: java.
io.IOException: Incompatible clusterIDs in /tmp/hadoop-aboutyun/dfs/data: nameno
de clusterID = CID-adabf762-f2f4-43b9-a807-7501f83a9176; datanode clusterID = CI
D-5c0474f8-7030-4fbc-bb79-6c9163afc5b8

2015-04-27 08:28:05,279 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: I
nitialization failed for Block pool <registering> (Datanode Uuid unassigned) ser
vice to localhost/127.0.0.1:9000. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionR
ead(DataStorage.java:477)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.
java:1387)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNod
e.java:1352)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNam
espaceInfo(BPOfferService.java:316)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndH
andshake(BPServiceActor.java:228)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceAc
tor.java:852)
at java.lang.Thread.run(Thread.java:744)
2015-04-27 08:28:05,283 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: En
ding block pool service for: Block pool <registering> (Datanode Uuid unassigned)
service to localhost/127.0.0.1:9000
2015-04-27 08:28:05,305 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Re
moved Block pool <registering> (Datanode Uuid unassigned)
2015-04-27 08:28:07,306 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ex
iting Datanode
2015-04-27 08:28:07,309 INFO org.apache.hadoop.util.ExitUtil: Exiting with statu
s 0
2015-04-27 08:28:07,310 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SH
UTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at ubuntu/127.0.1.1
************************************************************/

进入路径/tmp/hadoop-aboutyun/dfs/data,修改VERSION文件
clusterID.png

然后停止集群
  1. stop-dfs.sh
复制代码
再次启动
  1. stop-dfs.sh
复制代码


验证:
停止集群的时候,看到stop datanode,说明修改成功

stop.png

---------------------------------------------------------------------------
再次执行命令:
  1. hdfs dfs -put etc/hadoop input
复制代码

遇到下面错误
  1. put: File /user/aboutyun/input/yarn-env.sh._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s)
  2. running and no node(s) are excluded in this operation.
  3. 15/04/27 08:16:30 WARN hdfs.DFSClient: DataStreamer Exception
  4. org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/aboutyun/input/yarn-site.xml._COPYING_ could only be replicated to 0 nodes
  5. instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.
  6.         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550)
  7.         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3067)
  8.         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:722)
  9.         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock
  10. (ClientNamenodeProtocolServerSideTranslatorPB.java:492)
  11.         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod
  12. (ClientNamenodeProtocolProtos.java)
  13.         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
  14.         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
  15.         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
  16.         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
  17.         at java.security.AccessController.doPrivileged(Native Method)
  18.         at javax.security.auth.Subject.doAs(Subject.java:415)
  19.         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
  20.         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
  21.         at org.apache.hadoop.ipc.Client.call(Client.java:1476)
  22.         at org.apache.hadoop.ipc.Client.call(Client.java:1407)
  23.         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
  24.         at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
  25.         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
  26.         at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
  27.         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  28.         at java.lang.reflect.Method.invoke(Method.java:606)
  29.         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
  30.         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  31.         at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
  32.         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1430)
  33.         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1226)
  34.         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
  35. put: File /user/aboutyun/input/yarn-site.xml._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s)
  36. running and no node(s) are excluded in this operation.
复制代码

  1. ls: Call From java.net.UnknownHostException: ubuntu: ubuntu to localhost:9000 failed on connection exception: java.net.ConnectException: Connection
  2. refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
复制代码
解决办法:
修改hosts:记得注释掉127.0.1.1      ubuntu

127.0.0.1       localhost
#127.0.1.1      ubuntu
10.0.0.81       ubuntu
1.png
上面很多都遇到这个问题,解决办法,还包括关闭防火墙,等因素,而这里需要注释掉127.0.1.1
----------------------------------------------------------------------------
  1. ls: Call From java.net.UnknownHostException: ubuntu: ubuntu to localhost:9000 failed on connection exception: java.net.ConnectException: Connection
  2. refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
复制代码

查看日志
  1. more hadoop-aboutyun-namenode-ubuntu.log
复制代码
  1. Directory /tmp/hadoop-aboutyun/dfs/name is in an inconsistent state: storage directory does n
  2. ot exist or is not accessible.
复制代码
原来是没有name这个文件夹,所以在
  1. /tmp/hadoop-aboutyun/dfs/
复制代码

手工创建一个name文件mkdir name

再次格式化,成功
#####################################

查看通过命令上传的文件
  1. hdfs dfs -put etc/hadoop input
复制代码

1.jpg



这里需要注意路径的问题,进入hadoop_home在执行
  1. bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input output 'dfs[a-z.]+'
复制代码

3.jpg



通过命令
  1. hdfs dfs -cat /user/aboutyun/output/part-r-00000
复制代码
输出结果
  1. 6        dfs.audit.logger
  2. 4        dfs.class
  3. 3        dfs.server.namenode.
  4. 2        dfs.period
  5. 2        dfs.audit.log.maxfilesize
  6. 2        dfs.audit.log.maxbackupindex
  7. 1        dfsmetrics.log
  8. 1        dfsadmin
  9. 1        dfs.servers
  10. 1        dfs.replication
  11. 1        dfs.permissions
  12. 1        dfs.file
复制代码



4.jpg



##################################

  1. hdfs dfs -get output output
复制代码

遇到问题:
  1. WARN hdfs.DFSClient: DFSInputStream has been closed already
复制代码
留待以后解决



已有(9)人评论

跳转到指定楼层
jseven 发表于 2015-5-15 15:59:27
非常感谢,照着这两篇文章,成功安装了hadoop2.8.对于最后的那个警告问题,apache官网有问题记录
https://issues.apache.org/jira/browse/HDFS-8099
建议是直接修改源代码,将WARN级别改为DEBUG级别。代码如下:
diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
index cf8015f..9f7b15c 100644
--- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
+++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
@@ -666,7 +666,7 @@ private synchronized DatanodeInfo blockSeekTo(long target) throws IOException {
   @Override
   public synchronized void close() throws IOException {
     if (!closed.compareAndSet(false, true)) {
-      DFSClient.LOG.warn("DFSInputStream has been closed already");
+      DFSClient.LOG.debug("DFSInputStream has been closed already");
       return;
     }
     dfsClient.checkOpen();
回复

使用道具 举报

jkdcdlly 发表于 2015-6-19 10:15:54
楼主:进入路径/tmp/hadoop-aboutyun/dfs/data,修改VERSION文件
并没有这个目录
D:/QQ截图20150619101221.png
回复

使用道具 举报

jkdcdlly 发表于 2015-6-19 10:18:06
楼主:进入路径/tmp/hadoop-aboutyun/dfs/data,修改VERSION文件
并没有这个目录

QQ截图20150619101221.png
回复

使用道具 举报

jkdcdlly 发表于 2015-6-19 10:23:56
OK  找到了
回复

使用道具 举报

ashic 发表于 2015-7-23 22:54:19
  • WARN hdfs.DFSClient: DFSInputStream has been closed already
  • 这个问题解决了吗?

回复

使用道具 举报

ableq 发表于 2015-11-2 10:31:22
安装完hadoop 2.7.1是否自动有input、output目录,好像没见到要手动创建这两个目录?
回复

使用道具 举报

chinaboy 发表于 2016-4-28 14:25:50
ableq 发表于 2015-11-2 10:31
安装完hadoop 2.7.1是否自动有input、output目录,好像没见到要手动创建这两个目录?

input  output是hdfs里面的路径
回复

使用道具 举报

zouzhi 发表于 2017-4-6 22:36:49


input  output是hdfs里面的路径
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条