分享

hadoop2.2运行mapreduce(wordcount)问题总结

pig2 2014-5-15 02:35:12 发表于 总结型 [显示全部楼层] 只看大图 回帖奖励 阅读模式 关闭右栏 39 201663
本帖最后由 pig2 于 2014-5-15 16:46 编辑
问题导读:
1.出现Not a valid JAR:可能原因是什么?
2.运行wordcount,路径不正确会报什么错误?
3.一旦输入数据丢失,mapreduce正在运行,会出现什么错误?





此篇是在hadoop2完全分布式最新高可靠安装文档

hadoop2.X使用手册1:通过web端口查看主节点、slave1节点及集群运行状态

hadoop2.X使用手册2:如何运行自带wordcount基础上做的一个总结

这里对运行mapreduce,做一个问题汇总:第一个问题:Not a valid JAR:
下面举例,对里面的内容分析:
  1. hadoop jar  /hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /data/wordcount /output/wordcount
复制代码

hadoop-mapreduce-examples-2.2.0.jar 位于下面路径中,所以如果路径不对会报如下错误:
  1. Not a valid JAR: /hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
复制代码


weizhi.jpg


那么正确写法为:
hadoop jar /usr/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /data/wordcount /output/wordcount

同时要注意输入输出路径不正确也有可能会报这个错误,还有另外一种就是你的包真的无效了。

第二个问题:Could not complete /tmp/hadoop-yarn/staging/job_1400084979891_0001/job.jar retrying...
这里不断进行尝试,错误表现如下:
  1. Could not complete /tmp/hadoop-yarn/staging/yonghuming/.staging/job_1400084979891_0001/job.jar retrying...
复制代码
解决方法:

出上面问题是因为集群处于安全模式,(集群格式化之后,一般会处于安全模式,所以这时候运行mapredcue,最好先检查一下集群模式
  1. /hadoop dfsadmin -safemode leave
复制代码

第三个问题:Could not obtain block: BP-1908912651-172.16.77.15-1399795457132:blk_1073741826_1002
对于上面错误很是不能保存block,但是奇怪的172.16.77.15,并没有挂掉。错误如下:
  1. hadoop jar /usr/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /data/wordcount /output/wordcount
  2. 14/05/14 09:38:55 INFO client.RMProxy: Connecting to ResourceManager at master/172.16.77.15:8032
  3. 14/05/14 09:38:56 INFO input.FileInputFormat: Total input paths to process : 1
  4. 14/05/14 09:38:56 INFO mapreduce.JobSubmitter: number of splits:1
  5. 14/05/14 09:38:56 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
  6. 14/05/14 09:38:56 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
  7. 14/05/14 09:38:56 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
  8. 14/05/14 09:38:56 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
  9. 14/05/14 09:38:56 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
  10. 14/05/14 09:38:56 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
  11. 14/05/14 09:38:56 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
  12. 14/05/14 09:38:56 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
  13. 14/05/14 09:38:56 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
  14. 14/05/14 09:38:56 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
  15. 14/05/14 09:38:56 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
  16. 14/05/14 09:38:56 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
  17. 14/05/14 09:38:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1400084979891_0002
  18. 14/05/14 09:38:58 INFO impl.YarnClientImpl: Submitted application application_1400084979891_0002 to ResourceManager at master/172.16.77.15:8032
  19. 14/05/14 09:38:58 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1400084979891_0002/
  20. 14/05/14 09:38:58 INFO mapreduce.Job: Running job: job_1400084979891_0002
  21. 14/05/14 09:39:19 INFO mapreduce.Job: Job job_1400084979891_0002 running in uber mode : false
  22. 14/05/14 09:39:19 INFO mapreduce.Job:  map 0% reduce 0%
  23. 14/05/14 09:40:12 INFO mapreduce.Job:  map 100% reduce 0%
  24. 14/05/14 09:40:12 INFO mapreduce.Job: Task Id : attempt_1400084979891_0002_m_000000_0, Status : FAILED
  25. Error: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1908912651-172.16.77.15-1399795457132:blk_1073741826_1002
  26. file=/data/wordcount/inputWord
  27.         at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:838)
  28.         at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:526)
  29.         at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:749)
  30.         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793)
  31.         at java.io.DataInputStream.read(DataInputStream.java:100)
  32.         at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211)
  33.         at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
  34.         at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:164)
  35.         at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
  36.         at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
  37.         at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
  38.         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
  39.         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
  40.         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
  41.         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
  42.         at java.security.AccessController.doPrivileged(Native Method)
  43.         at javax.security.auth.Subject.doAs(Subject.java:415)
  44.         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
  45.         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
  46. 14/05/14 09:40:14 INFO mapreduce.Job:  map 0% reduce 0%
  47. 14/05/14 09:41:09 INFO mapreduce.Job: Task Id : attempt_1400084979891_0002_m_000000_1, Status : FAILED
  48. Error: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1908912651-172.16.77.15-1399795457132:blk_1073741826_1002
  49. file=/data/wordcount/inputWord
  50.         at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:838)
  51.         at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:526)
  52.         at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:749)
  53.         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793)
  54.         at java.io.DataInputStream.read(DataInputStream.java:100)
  55.         at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211)
  56.         at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
  57.         at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:164)
  58.         at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
  59.         at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
  60.         at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
  61.         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
  62.         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
  63.         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
  64.         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
  65.         at java.security.AccessController.doPrivileged(Native Method)
  66.         at javax.security.auth.Subject.doAs(Subject.java:415)
  67.         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
  68.         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
  69. 14/05/14 09:41:33 INFO mapreduce.Job: Task Id : attempt_1400084979891_0002_m_000000_2, Status : FAILED
  70. Error: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1908912651-172.16.77.15-1399795457132:blk_1073741826_1002
  71. file=/data/wordcount/inputWord
  72.         at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:838)
  73.         at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:526)
  74.         at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:749)
  75.         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793)
  76.         at java.io.DataInputStream.read(DataInputStream.java:100)
  77.         at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211)
  78.         at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
  79.         at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:164)
  80.         at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
  81.         at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
  82.         at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
  83.         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
  84.         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
  85.         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
  86.         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
  87.         at java.security.AccessController.doPrivileged(Native Method)
  88.         at javax.security.auth.Subject.doAs(Subject.java:415)
  89.         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
  90.         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
  91. 14/05/14 09:41:59 INFO mapreduce.Job:  map 100% reduce 100%
  92. 14/05/14 09:41:59 INFO mapreduce.Job: Job job_1400084979891_0002 failed with state FAILED due to: Task failed task_1400084979891_0002_m_000000
  93. Job failed as tasks failed. failedMaps:1 failedReduces:0
  94. 14/05/14 09:41:59 INFO mapreduce.Job: Counters: 5
  95.         Job Counters
  96.                 Failed map tasks=4
  97.                 Launched map tasks=4
  98.                 Other local map tasks=4
  99.                 Total time spent by all maps in occupied slots (ms)=152716
  100.                 Total time spent by all reduces in occupied slots (ms)=0
复制代码
出现上面错误,你可以使用下面命令来进行检查:
  1. hadoop fs -text /data/wordcount/inputWord
复制代码
解释一下上面内容:
inputWord是上传文件的内容,如果查看不了,如下错误,说明说明DataNode失去连接,或则已经不能使用了


  1. hadoop fs -text /data/wordcount/inputWord
  2. 14/05/14 10:29:23 INFO hdfs.DFSClient: No node available for BP-1908912651-172.16.77.15-1399795457132:blk_1073741826_1002 file=/data/wordcount/inputWord
  3. 14/05/14 10:29:23 INFO hdfs.DFSClient: Could not obtain BP-1908912651-172.16.77.15-1399795457132:blk_1073741826_1002 from any node: java.io.IOException: No live nodes
  4. contain current block. Will get new block locations from namenode and retry...
  5. 14/05/14 10:29:23 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 1611.781101650108 msec.
  6. 14/05/14 10:29:24 INFO hdfs.DFSClient: No node available for BP-1908912651-172.16.77.15-1399795457132:blk_1073741826_1002 file=/data/wordcount/inputWord
  7. 14/05/14 10:29:24 INFO hdfs.DFSClient: Could not obtain BP-1908912651-172.16.77.15-1399795457132:blk_1073741826_1002 from any node: java.io.IOException: No live nodes
  8. contain current block. Will get new block locations from namenode and retry...
  9. 14/05/14 10:29:24 WARN hdfs.DFSClient: DFS chooseDataNode: got # 2 IOException, will wait for 3653.3994797694913 msec.
  10. 14/05/14 10:29:28 INFO hdfs.DFSClient: No node available for BP-1908912651-172.16.77.15-1399795457132:blk_1073741826_1002 file=/data/wordcount/inputWord
  11. 14/05/14 10:29:28 INFO hdfs.DFSClient: Could not obtain BP-1908912651-172.16.77.15-1399795457132:blk_1073741826_1002 from any node: java.io.IOException: No live nodes
  12. contain current block. Will get new block locations from namenode and retry...
  13. 14/05/14 10:29:28 WARN hdfs.DFSClient: DFS chooseDataNode: got # 3 IOException, will wait for 6618.009385385317 msec.
  14. 14/05/14 10:29:35 WARN hdfs.DFSClient: DFS Read
  15. org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1908912651-172.16.77.15-1399795457132:blk_1073741826_1002 file=/data/wordcount/inputWord
  16.         at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:838)
  17.         at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:526)
  18.         at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:749)
  19.         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793)
  20.         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:601)
  21.         at java.io.DataInputStream.readShort(DataInputStream.java:312)
  22.         at org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:130)
  23.         at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:98)
  24.         at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:306)
  25.         at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
  26.         at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
  27.         at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
  28.         at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
  29.         at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
  30.         at org.apache.hadoop.fs.FsShell.run(FsShell.java:255)
  31.         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
  32.         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
  33.         at org.apache.hadoop.fs.FsShell.main(FsShell.java:305)
  34. text: Could not obtain block: BP-1908912651-172.16.77.15-1399795457132:blk_1073741826_1002 file=/data/wordcount/inputWord
复制代码












本帖被以下淘专辑推荐:

已有(39)人评论

跳转到指定楼层
long_ac 发表于 2015-5-9 18:14:09
roant 发表于 2015-1-18 12:19
1、你好,我现在遇到的问题和你描述的是一样的,在 running Job:....   之后就一直卡着不动了,进去后台 ...

我也遇到了任务一直在pending状态,不能往下运行,经过几天的倒腾的,总算解决了
现在把我的解决方法跟大家分享下,这几天在网上也查了很多资料,没有比较靠谱的回答
因为我设置了yarn.nodemanager.resource.memory-mb 这个的大小为1024MB,
即每个节点上的内存大小为1024,但是我运行的wordcount 需要的内存比我设置的要大,导致我的任务状态一直在pending状态中

如果你配置了yarn.nodemanager.resource.memory-mb这个配置项,你把值改大些,或者直接就用默认的然后再根据需要去调整

希望对纠结于这个问题的童鞋有帮助~~
回复

使用道具 举报

小鱼 发表于 2014-8-9 17:43:21
版大。那datanode失去连接怎么办啊?重新启动集群吗?
回复

使用道具 举报

pig2 发表于 2014-8-10 12:07:33
小鱼 发表于 2014-8-9 17:43
版大。那datanode失去连接怎么办啊?重新启动集群吗?
可以单独启动某个节点

  1. ./sbin/hadoop-daemon.sh start datanode slave1
复制代码

详细参考
如何寻找hadoop、hbase命令及单独启动hadoop datanode 及hbase regionserver

回复

使用道具 举报

Paul.Li 发表于 2014-9-29 19:14:42
版主,你好!我的运行之后一直处于pending状态,查日志也没有报错。不知道为什么?
[yarn@hadoop01 ~]$ hadoop jar /usr/local/yarn/Hadoop/hadoop-2.2.0/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.2.0-sources.jar org.apache.hadoop.examples.WordCount /data/wordcount /output/wordcount  
14/09/29 01:58:48 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.232.128:8032
14/09/29 01:58:49 INFO input.FileInputFormat: Total input paths to process : 1
14/09/29 01:58:49 INFO mapreduce.JobSubmitter: number of splits:1
14/09/29 01:58:49 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/09/29 01:58:49 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/09/29 01:58:49 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/09/29 01:58:49 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
14/09/29 01:58:49 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/09/29 01:58:49 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/09/29 01:58:49 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
14/09/29 01:58:49 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/09/29 01:58:49 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/09/29 01:58:49 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/09/29 01:58:49 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/09/29 01:58:49 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/09/29 01:58:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1411980196862_0003
14/09/29 01:58:50 INFO impl.YarnClientImpl: Submitted application application_1411980196862_0003 to ResourceManager at hadoop01/192.168.232.128:8032
14/09/29 01:58:50 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1411980196862_0003/
14/09/29 01:58:50 INFO mapreduce.Job: Running job: job_1411980196862_0003
回复

使用道具 举报

admin 发表于 2014-9-29 21:12:03
Paul.Li 发表于 2014-9-29 19:14
版主,你好!我的运行之后一直处于pending状态,查日志也没有报错。不知道为什么?
[yarn@hadoop01 ~]$ ha ...
离开安全模式了吗?
看看内存情况
回复

使用道具 举报

howtodown 发表于 2014-9-29 21:15:51
Paul.Li 发表于 2014-9-29 19:14
版主,你好!我的运行之后一直处于pending状态,查日志也没有报错。不知道为什么?
[yarn@hadoop01 ~]$ ha ...
你wordcount的内容是什么,也就是输入数据,不行就多等会看看。
回复

使用道具 举报

Paul.Li 发表于 2014-9-29 23:32:07
howtodown 发表于 2014-9-29 21:15
你wordcount的内容是什么,也就是输入数据,不行就多等会看看。

输入数据如下:
[yarn@hadoop01 ~]$ hadoop fs -text /data/wordcount/inputWord
hello Paul.Li
hello hadoop01
hello hadoop02
hello hadoop03
hello hadoop04
Paul.Li first
[yarn@hadoop01 ~]$
等了很长时间还是不行,记得以前测试1.2.1版本,好像运行几分钟就出来结果了。

回复

使用道具 举报

admin 发表于 2014-9-29 23:34:56
Paul.Li 发表于 2014-9-29 23:32
输入数据如下:
[yarn@hadoop01 ~]$ hadoop fs -text /data/wordcount/inputWord
hello Paul.Li
输入路径错误,按照下面格式:

hadoop jar /usr/local/yarn/Hadoop/hadoop-2.2.0/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.2.0-sources.jar org.apache.hadoop.examples.WordCount /data/wordcount/inputWord /output/wordcount

回复

使用道具 举报

Paul.Li 发表于 2014-9-29 23:50:26
admin 发表于 2014-9-29 23:34
输入路径错误,按照下面格式:

hadoop jar /usr/local/yarn/Hadoop/hadoop-2.2.0/share/hadoop/mapredu ...

还是不对,应该不是这个问题。我明天重装下试试。
回复

使用道具 举报

howtodown 发表于 2014-9-30 17:26:55
Paul.Li 发表于 2014-9-29 23:50
还是不对,应该不是这个问题。我明天重装下试试。
先看看进程是否全,仔细检查输入输出路径
回复

使用道具 举报

1234下一页
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条