分享

一个问题,困扰一个星期了,请达人解答

masterice 发表于 2015-1-26 13:15:45 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 16 68811
我用的是hadoop1.2.1,三台机器(虚拟机)乌班图系统,一个namenode,两个datanode。现在做一个简单的mapreduce。当数据大于524287行的时候,卡死。我已经在mapred-site.xml里面配置了,还是报错卡死。
        <property>  
          <name>mapred.task.timeout</name>  
          <value>1800000</value>  
          <description>The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string.
          </description>  
        </property>


下面是myeclipse8.5控制台打印出的信息。还有jobtracker的信息。

15/01/26 12:18:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/26 12:18:50 INFO input.FileInputFormat: Total input paths to process : 1
15/01/26 12:18:50 WARN snappy.LoadSnappy: Snappy native library not loaded
15/01/26 12:18:50 INFO mapred.JobClient: Running job: job_local170048910_0001
15/01/26 12:18:50 INFO mapred.LocalJobRunner: Waiting for map tasks
15/01/26 12:18:50 INFO mapred.LocalJobRunner: Starting task: attempt_local170048910_0001_m_000000_0
15/01/26 12:18:50 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
15/01/26 12:18:50 INFO mapred.MapTask: Processing split: hdfs://192.168.10.100:9000/user/grid/cyyytest/555:0+44874380
15/01/26 12:18:50 INFO mapred.MapTask: io.sort.mb = 100
15/01/26 12:18:50 INFO mapred.MapTask: data buffer = 79691776/99614720
15/01/26 12:18:50 INFO mapred.MapTask: record buffer = 262144/327680
15/01/26 12:18:51 INFO mapred.JobClient:  map 0% reduce 0%
15/01/26 12:18:53 INFO mapred.MapTask: Spilling map output: record full = true
15/01/26 12:18:53 INFO mapred.MapTask: bufstart = 0; bufend = 12481493; bufvoid = 99614720
15/01/26 12:18:53 INFO mapred.MapTask: kvstart = 0; kvend = 262144; length = 327680
15/01/26 12:18:54 INFO mapred.MapTask: Finished spill 0
15/01/26 12:18:55 INFO mapred.MapTask: Spilling map output: record full = true
15/01/26 12:18:55 INFO mapred.MapTask: bufstart = 12481493; bufend = 24951394; bufvoid = 99614720
15/01/26 12:18:55 INFO mapred.MapTask: kvstart = 262144; kvend = 196607; length = 327680
15/01/26 12:18:55 INFO mapred.MapTask: Starting flush of map output
15/01/26 12:18:56 INFO mapred.LocalJobRunner:
15/01/26 12:18:56 INFO mapred.MapTask: Finished spill 1
15/01/26 12:18:56 INFO mapred.MapTask: Finished spill 2
15/01/26 12:18:56 INFO mapred.Merger: Merging 3 sorted segments
15/01/26 12:18:56 INFO mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 15673220 bytes
15/01/26 12:18:56 INFO mapred.MapTask: Starting flush of map output
15/01/26 12:18:57 INFO mapred.JobClient:  map 100% reduce 0%
15/01/26 12:18:59 INFO mapred.LocalJobRunner:
15/01/26 12:19:02 INFO mapred.LocalJobRunner:

==============================下面是日志信息======================================

failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
Task attempt_201501191057_0006_m_000000_0 failed to report status for 1800 seconds. Killing!

2015-01-19 17:46:04,853 INFO org.apache.hadoop.mapred.Task: Communication exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: JvmValidate Failed. Ignoring request from task: attempt_201501191057_0006_m_000000_0, with JvmId: jvm_201501191057_0006_m_-747149602
        at org.apache.hadoop.mapred.TaskTracker.validateJVM(TaskTracker.java:3465)
        at org.apache.hadoop.mapred.TaskTracker.ping(TaskTracker.java:3598)
        at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)

        at org.apache.hadoop.ipc.Client.call(Client.java:1113)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at com.sun.proxy.$Proxy1.ping(Unknown Source)
        at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:685)
        at java.lang.Thread.run(Thread.java:744)

2015-01-19 17:46:04,857 INFO org.apache.hadoop.mapred.Task: Process Thread Dump: Communication exception
12 active threads
Thread 23 (Readahead Thread #3):
  State: WAITING
  Blocked count: 0
  Waited count: 1
  Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@10496f0
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
    java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374)
    java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:744)
Thread 22 (Readahead Thread #2):
  State: WAITING
  Blocked count: 0
  Waited count: 1
  Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@10496f0
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
    java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374)
    java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:744)
Thread 21 (Readahead Thread #1):
  State: WAITING
  Blocked count: 0
  Waited count: 2
  Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@10496f0
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
    java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374)
    java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:744)
Thread 20 (Readahead Thread #0):
  State: WAITING
  Blocked count: 0
  Waited count: 3
  Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@10496f0
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
    java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374)
    java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:744)
Thread 14 (communication thread):
  State: RUNNABLE
  Blocked count: 2401
  Waited count: 4815
  Stack:
    sun.management.ThreadImpl.getThreadInfo1(Native Method)
    sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:174)
    sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:139)
    org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
    org.apache.hadoop.util.ReflectionUtils.logThreadInfo(ReflectionUtils.java:203)
    org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:703)
    java.lang.Thread.run(Thread.java:744)
Thread 12 (Timer for &apos;MapTask&apos; metrics system):
  State: TIMED_WAITING
  Blocked count: 0
  Waited count: 725
  Stack:
    java.lang.Object.wait(Native Method)
    java.util.TimerThread.mainLoop(Timer.java:552)
    java.util.TimerThread.run(Timer.java:505)
Thread 10 (Thread for syncLogs):
  State: TIMED_WAITING
  Blocked count: 5
  Waited count: 1449
  Stack:
    java.lang.Thread.sleep(Native Method)
    org.apache.hadoop.mapred.Child$3.run(Child.java:139)
Thread 8 (IPC Client (47) connection to /127.0.0.1:45027 from job_201501191057_0006):
  State: TIMED_WAITING
  Blocked count: 2415
  Waited count: 2416
  Stack:
    java.lang.Object.wait(Native Method)
    org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:747)
    org.apache.hadoop.ipc.Client$Connection.run(Client.java:789)
Thread 4 (Signal Dispatcher):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
Thread 3 (Finalizer):
  State: WAITING
  Blocked count: 32
  Waited count: 33
  Waiting on java.lang.ref.ReferenceQueue$Lock@b303c1
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
    java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
    java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
Thread 2 (Reference Handler):
  State: WAITING
  Blocked count: 37
  Waited count: 38
  Waiting on java.lang.ref.Reference$Lock@bb9a37
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:503)
    java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
Thread 1 (main):
  State: WAITING
  Blocked count: 7
  Waited count: 9
  Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1616864
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1294)
    org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
    org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1793)
    org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
    org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    java.security.AccessController.doPrivileged(Native Method)
    javax.security.auth.Subject.doAs(Subject.java:415)
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    org.apache.hadoop.mapred.Child.main(Child.java:249)


已有(17)人评论

跳转到指定楼层
masterice 发表于 2015-1-26 16:38:35
没人回复呀。。。
回复

使用道具 举报

bioger_hit 发表于 2015-1-26 17:25:13
建议使用两份数据,如果两份都正常,这可能不是卡死的问题,而是程序处理问题

点评

也就是数据量更大一些,或则更换数据源,查查到底是那里问题  发表于 2015-1-26 23:21
回复

使用道具 举报

595460482@qq.co 发表于 2015-1-26 19:57:14
由于持续时间长,往往会出现超时的错误,错误内容如下:

Task attempt_201005281116_119912_r_000823_0 failed to report status for 606 seconds. Killing!
10/06/10 10:49:45 INFO mapred.JobClient: Task Id : attempt_201005281116_119912_r_000015_1, Status : FAILED

有两种办法可以修复
1)解决办法是在代码中定时report  context.progress();
2)in mapred-site.xml添加或者修改
<property>
  <name>mapred.task.timeout</name>
  <value>1800000</value> <!-- 30 minutes -->
</property>

另外一个问题是是否是 ,三台机器(虚拟机)上的该配置文件都改了
回复

使用道具 举报

595460482@qq.co 发表于 2015-1-26 20:03:48
http://stackoverflow.com/questions/5864589/how-to-fix-task-attempt-201104251139-0295-r-000006-0-failed-to-report-status-fo
回复

使用道具 举报

masterice 发表于 2015-1-27 08:56:01
bioger_hit 发表于 2015-1-26 17:25
建议使用两份数据,如果两份都正常,这可能不是卡死的问题,而是程序处理问题

感谢你的回答。我的很奇怪,数据条数到524288条就完蛋了,524287条就好使,我将第524288条删掉了(排除数据的原因)并拷贝了第524287条到524288条,也是一样死掉。我感觉只要一出现“Finished spill 2”就完蛋了。我知道默认有3次的溢写过程,但在mapred-site.xml加入了min.num.spill.for.combine=5一样不起作用。
        <property>
                <name>min.num.spill.for.combine</name>
                <value>5</value>
        </property>


回复

使用道具 举报

masterice 发表于 2015-1-27 08:59:54
595460482@qq.co 发表于 2015-1-26 19:57
由于持续时间长,往往会出现超时的错误,错误内容如下:

Task attempt_201005281116_119912_r_000823_0  ...

感谢你的回答
我觉得超时不是主要的原因,只要出现“Finished spill 2”就必卡死,超时是因为卡死引起的,我的数据才524288条,数据文件大42.7m,很小的文件(相对于那些日志文件太小了)。三台机器都加入了mapred.task.timeout=1800000
回复

使用道具 举报

masterice 发表于 2015-1-27 09:01:06
595460482@qq.co 发表于 2015-1-26 20:03
http://stackoverflow.com/questions/5864589/how-to-fix-task-attempt-201104251139-0295-r-000006-0-fail ...

感谢你的回答
这篇帖子我看过了,感觉不是这样的问题,加长了时间也一样不好使,曾经加到过2小时,还是依旧卡死。
回复

使用道具 举报

bioger_hit 发表于 2015-1-27 13:41:37
masterice 发表于 2015-1-27 09:01
感谢你的回答
这篇帖子我看过了,感觉不是这样的问题,加长了时间也一样不好使,曾经加到过2小时,还是 ...

楼主的意思是说524287行就出错了,把这行删掉,看看会是什么效果
回复

使用道具 举报

masterice 发表于 2015-1-27 14:56:53
bioger_hit 发表于 2015-1-27 13:41
楼主的意思是说524287行就出错了,把这行删掉,看看会是什么效果

感谢老兄
是524288行出错,我将524288行删掉了就正常了。我以为是第524288行数据有问题,所以我删除了原第524288的数据,复制并拷贝了第524287行的数据,所以文件还是524288行,只不过524287和524288数据一样,依旧卡死。
回复

使用道具 举报

12下一页
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条