pig2 发表于 2015-4-28 14:22:57

hadoop2.7 运行wordcount

问题导读

1.Call From: ubuntu to localhost:9000 failed on connection exception: java.net.ConnectException: Connection本文是如何解决的?
2.运行wordcount,本文做了那些准备工作?
3.如何查看运行结果?


static/image/hrline/4.gif


接上篇
hadoop2.7【单节点】单机、伪分布、分布式安装指导

hdfs dfs -mkdir /user


hdfs dfs -mkdir /user/aboutyun



hdfs dfs -put etc/hadoop input#####################################

这条命令的执行需要注意路径:
hdfs dfs -put etc/hadoop input
执行路径为hadoop_home我这里是~/hadoop-2.7.0


在执行这条命令的时候遇到错误:
put: File /user/aboutyun/input/yarn-env.sh._COPYING_ could only be replicated to

0 nodes instead of minReplication (=1).There are 0 datanode(s) running and no

node(s) are excluded in this operation.
15/04/27 08:16:30 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File

/user/aboutyun/input/yarn-site.xml._COPYING_ could only be replicated to 0 nodes

instead of minReplication (=1).There are 0 datanode(s) running and no node(s)

are excluded in this operation.
      at

org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock

(BlockManager.java:1550)
      at

org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock

(FSNamesystem.java:3067)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock

(NameNodeRpcServer.java:722)
      at

org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.a

ddBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos

$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server

$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs

(UserGroupInformation.java:1657)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

      at org.apache.hadoop.ipc.Client.call(Client.java:1476)
      at org.apache.hadoop.ipc.Client.call(Client.java:1407)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke

(ProtobufRpcEngine.java:229)
      at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
      at

org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock

(ClientNamenodeProtocolTranslatorPB.java:418)
      at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke

(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod

(RetryInvocationHandler.java:187)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke

(RetryInvocationHandler.java:102)
      at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
      at org.apache.hadoop.hdfs.DFSOutputStream

$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1430)
      at org.apache.hadoop.hdfs.DFSOutputStream

$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1226)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run

(DFSOutputStream.java:449)
put: File /user/aboutyun/input/yarn-site.xml._COPYING_ could only be replicated

to 0 nodes instead of minReplication (=1).There are 0 datanode(s) running and

no node(s) are excluded in this operation.
通过jps查看进程都在,但是上面却报错,于是重启。
在重启的过程中,发现no datanode to stop。
看来datanode成为了僵死的进程。



再次启动
start-dfs.sh

还是没有成功,再次查看日志
2015-04-27 08:28:05,274 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock
on /tmp/hadoop-aboutyun/dfs/data/in_use.lock acquired by nodename 13969@ubuntu
2015-04-27 08:28:05,278 WARN org.apache.hadoop.hdfs.server.common.Storage: java.
io.IOException: Incompatible clusterIDs in /tmp/hadoop-aboutyun/dfs/data: nameno
de clusterID = CID-adabf762-f2f4-43b9-a807-7501f83a9176; datanode clusterID = CI
D-5c0474f8-7030-4fbc-bb79-6c9163afc5b8
2015-04-27 08:28:05,279 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: I
nitialization failed for Block pool <registering> (Datanode Uuid unassigned) ser
vice to localhost/127.0.0.1:9000. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionR
ead(DataStorage.java:477)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.
java:1387)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNod
e.java:1352)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNam
espaceInfo(BPOfferService.java:316)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndH
andshake(BPServiceActor.java:228)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceAc
tor.java:852)
at java.lang.Thread.run(Thread.java:744)
2015-04-27 08:28:05,283 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: En
ding block pool service for: Block pool <registering> (Datanode Uuid unassigned)
service to localhost/127.0.0.1:9000
2015-04-27 08:28:05,305 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Re
moved Block pool <registering> (Datanode Uuid unassigned)
2015-04-27 08:28:07,306 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ex
iting Datanode
2015-04-27 08:28:07,309 INFO org.apache.hadoop.util.ExitUtil: Exiting with statu
s 0
2015-04-27 08:28:07,310 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SH
UTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at ubuntu/127.0.1.1
************************************************************/
进入路径/tmp/hadoop-aboutyun/dfs/data,修改VERSION文件


然后停止集群
stop-dfs.sh
再次启动
stop-dfs.sh

验证:
停止集群的时候,看到stop datanode,说明修改成功



---------------------------------------------------------------------------
再次执行命令:hdfs dfs -put etc/hadoop input
遇到下面错误
put: File /user/aboutyun/input/yarn-env.sh._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).There are 0 datanode(s)

running and no node(s) are excluded in this operation.
15/04/27 08:16:30 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/aboutyun/input/yarn-site.xml._COPYING_ could only be replicated to 0 nodes

instead of minReplication (=1).There are 0 datanode(s) running and no node(s) are excluded in this operation.
      at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3067)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:722)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock

(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod

(ClientNamenodeProtocolProtos.java)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

      at org.apache.hadoop.ipc.Client.call(Client.java:1476)
      at org.apache.hadoop.ipc.Client.call(Client.java:1407)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
      at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
      at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
      at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1430)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1226)
      at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
put: File /user/aboutyun/input/yarn-site.xml._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).There are 0 datanode(s)

running and no node(s) are excluded in this operation.

ls: Call From java.net.UnknownHostException: ubuntu: ubuntu to localhost:9000 failed on connection exception: java.net.ConnectException: Connection

refused; For more details see:http://wiki.apache.org/hadoop/ConnectionRefused解决办法:
修改hosts:记得注释掉127.0.1.1      ubuntu

127.0.0.1       localhost
#127.0.1.1      ubuntu
10.0.0.81       ubuntu

上面很多都遇到这个问题,解决办法,还包括关闭防火墙,等因素,而这里需要注释掉127.0.1.1
----------------------------------------------------------------------------

ls: Call From java.net.UnknownHostException: ubuntu: ubuntu to localhost:9000 failed on connection exception: java.net.ConnectException: Connection

refused; For more details see:http://wiki.apache.org/hadoop/ConnectionRefused
查看日志 more hadoop-aboutyun-namenode-ubuntu.log

Directory /tmp/hadoop-aboutyun/dfs/name is in an inconsistent state: storage directory does n
ot exist or is not accessible.
原来是没有name这个文件夹,所以在 /tmp/hadoop-aboutyun/dfs/
手工创建一个name文件mkdir name

再次格式化,成功
#####################################

查看通过命令上传的文件hdfs dfs -put etc/hadoop input




这里需要注意路径的问题,进入hadoop_home在执行
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input output 'dfs+'




通过命令
hdfs dfs -cat /user/aboutyun/output/part-r-00000输出结果
6      dfs.audit.logger
4      dfs.class
3      dfs.server.namenode.
2      dfs.period
2      dfs.audit.log.maxfilesize
2      dfs.audit.log.maxbackupindex
1      dfsmetrics.log
1      dfsadmin
1      dfs.servers
1      dfs.replication
1      dfs.permissions
1      dfs.file






##################################

hdfs dfs -get output output
遇到问题:
WARN hdfs.DFSClient: DFSInputStream has been closed already
留待以后解决



jseven 发表于 2015-5-15 15:59:27

非常感谢,照着这两篇文章,成功安装了hadoop2.8.对于最后的那个警告问题,apache官网有问题记录
https://issues.apache.org/jira/browse/HDFS-8099
建议是直接修改源代码,将WARN级别改为DEBUG级别。代码如下:
diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
index cf8015f..9f7b15c 100644
--- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
+++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
@@ -666,7 +666,7 @@ private synchronized DatanodeInfo blockSeekTo(long target) throws IOException {
   @Override
   public synchronized void close() throws IOException {
   if (!closed.compareAndSet(false, true)) {
-      DFSClient.LOG.warn("DFSInputStream has been closed already");
+      DFSClient.LOG.debug("DFSInputStream has been closed already");
       return;
   }
   dfsClient.checkOpen();

jkdcdlly 发表于 2015-6-19 10:15:54

楼主:进入路径/tmp/hadoop-aboutyun/dfs/data,修改VERSION文件
并没有这个目录
D:/QQ截图20150619101221.png

jkdcdlly 发表于 2015-6-19 10:18:06

楼主:进入路径/tmp/hadoop-aboutyun/dfs/data,修改VERSION文件
并没有这个目录

jkdcdlly 发表于 2015-6-19 10:23:56

OK找到了

ashic 发表于 2015-7-23 22:54:19


[*]WARN hdfs.DFSClient: DFSInputStream has been closed already
[*]这个问题解决了吗?

snn123456 发表于 2015-10-21 17:14:36

谢谢分享!

ableq 发表于 2015-11-2 10:31:22

安装完hadoop 2.7.1是否自动有input、output目录,好像没见到要手动创建这两个目录?

chinaboy 发表于 2016-4-28 14:25:50

ableq 发表于 2015-11-2 10:31
安装完hadoop 2.7.1是否自动有input、output目录,好像没见到要手动创建这两个目录?

inputoutput是hdfs里面的路径

zouzhi 发表于 2017-4-6 22:36:49



inputoutput是hdfs里面的路径
页: [1]
查看完整版本: hadoop2.7 运行wordcount