分享

hadoop程序运行超时出错,其中之一的container日志如下

自己编写mapreduce程序,map过程中超时错误,而且系统变得奇卡,我在map中打印了下map运行时间:
container2的输出:6394条数据文件相交时间花费:0分2秒
container3的输出:4462条数据文件相交时间花费:5878分23秒
container4的输出:3219条数据文件相交时间花费:10469分53秒
container3的syslog日志摘取:[mw_shl_code=applescript,true]2018-07-31 10:49:04,404 WARN [communication thread] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.111.12:34536 remote=/192.168.111.11:33422]
2018-07-31 10:50:10,146 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: java.io.IOException: Failed on local exception: java.io.IOException: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.111.12:34536 remote=/192.168.111.11:33422]; Host Details : local host is: "slave2/192.168.111.12"; destination host is: "slave1":33422;
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:782)
    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1493)
    at org.apache.hadoop.ipc.Client.call(Client.java:1435)
    at org.apache.hadoop.ipc.Client.call(Client.java:1345)
    at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:249)
    at com.sun.proxy.$Proxy8.ping(Unknown Source)
    at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:817)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.111.12:34536 remote=/192.168.111.11:33422]
    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:755)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
    at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:718)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:811)
    at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1550)
    at org.apache.hadoop.ipc.Client.call(Client.java:1381)
    ... 5 more
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.111.12:34536 remote=/192.168.111.11:33422]
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at java.io.FilterInputStream.read(FilterInputStream.java:133)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
    at java.io.DataInputStream.readInt(DataInputStream.java:387)
    at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1794)
    at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:364)
    at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:614)
    at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:410)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:798)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:794)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:793)
    ... 8 more[/mw_shl_code]

container3的stderr日志摘取:
[mw_shl_code=applescript,true]SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.8.3/tmp/nm-local-dir/usercache/hadoop/appcache/application_1533000535218_0004/filecache/10/job.jar/job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory][/mw_shl_code]

container3的stdeout日志摘取:
[mw_shl_code=applescript,true]Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.121-b13 mixed mode):

"IPC Parameter Sending Thread #3" #37 daemon prio=5 os_prio=0 tid=0x0000000000e16000 nid=0x3a22 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"IPC Client (769798433) connection to slave1/192.168.111.11:33422 from job_1533000535218_0004" #36 daemon prio=5 os_prio=0 tid=0x0000000000ff2000 nid=0x3a21 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"SpillThread" #25 daemon prio=5 os_prio=0 tid=0x00007f8908dcd800 nid=0x39bb waiting on condition [0x00007f88f0338000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000fa100e88> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1530)

"org.apache.hadoop.hdfs.PeerCache@1d1f7216" #24 daemon prio=5 os_prio=0 tid=0x00007f8909134800 nid=0x398a waiting on condition [0x00007f88e5ad5000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hdfs.PeerCache.run(PeerCache.java:253)
        at org.apache.hadoop.hdfs.PeerCache.access$000(PeerCache.java:46)
        at org.apache.hadoop.hdfs.PeerCache$1.run(PeerCache.java:124)
        at java.lang.Thread.run(Thread.java:745)

"communication thread" #18 daemon prio=5 os_prio=0 tid=0x00007f8909082800 nid=0x3944 runnable [0x00007f88f0439000]
   java.lang.Thread.State: RUNNABLE
        at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
        at org.apache.hadoop.ipc.Client.call(Client.java:1381)
        at org.apache.hadoop.ipc.Client.call(Client.java:1345)
        at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:249)
        at com.sun.proxy.$Proxy8.ping(Unknown Source)
        at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:817)
        at java.lang.Thread.run(Thread.java:745)

"client DomainSocketWatcher" #17 daemon prio=5 os_prio=0 tid=0x00007f8909055800 nid=0x393f runnable [0x00007f88f053a000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
        at org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
        at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:503)
        at java.lang.Thread.run(Thread.java:745)

"Thread for syncLogs" #16 daemon prio=5 os_prio=0 tid=0x00007f8908f15800 nid=0x392e waiting on condition [0x00007f88f0a3b000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000f3a8c830> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

"org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" #14 daemon prio=5 os_prio=0 tid=0x00007f8908df7000 nid=0x3921 in Object.wait() [0x00007f88f1050000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000f3aa5b50> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
        - locked <0x00000000f3aa5b50> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
        at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3212)
        at java.lang.Thread.run(Thread.java:745)

"Timer for 'MapTask' metrics system" #11 daemon prio=5 os_prio=0 tid=0x00007f8908ca5000 nid=0x3909 in Object.wait() [0x00007f88f2218000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000f3a061d0> (a java.util.TaskQueue)
        at java.util.TimerThread.mainLoop(Timer.java:552)
        - locked <0x00000000f3a061d0> (a java.util.TaskQueue)
        at java.util.TimerThread.run(Timer.java:505)

"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x00007f89080c0800 nid=0x386c runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f89080be000 nid=0x386a runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f89080bb000 nid=0x3869 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f89080b9800 nid=0x3868 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f8908086800 nid=0x3865 in Object.wait() [0x00007f88f35f4000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000f39388b0> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
        - locked <0x00000000f39388b0> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f8908082000 nid=0x3863 runnable [0x00007f88f36f5000]
   java.lang.Thread.State: RUNNABLE
        at sun.misc.Unsafe.freeMemory(Native Method)
        at java.nio.DirectByteBuffer$Deallocator.run(DirectByteBuffer.java:94)
        at sun.misc.Cleaner.clean(Cleaner.java:143)
        at java.lang.ref.Reference.tryHandlePending(Reference.java:212)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

"main" #1 prio=5 os_prio=0 tid=0x00007f8908010000 nid=0x385b runnable [0x00007f890f25f000]
   java.lang.Thread.State: RUNNABLE
        at java.nio.CharBuffer.<init>(CharBuffer.java:281)
        at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:70)
        at java.nio.CharBuffer.wrap(CharBuffer.java:373)
        at java.nio.CharBuffer.wrap(CharBuffer.java:396)
        at org.apache.hadoop.io.Text.encode(Text.java:450)
        at org.apache.hadoop.io.Text.set(Text.java:198)
        at org.apache.hadoop.io.Text.<init>(Text.java:88)
        at com.geoway.intersect.intersection.ShpRecordReader.nextKeyValue(ShpRecordReader.java:56)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)

"VM Thread" os_prio=0 tid=0x00007f890807a000 nid=0x3861 runnable

"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f8908025800 nid=0x385c runnable

"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f8908027000 nid=0x385f runnable

"VM Periodic Task Thread" os_prio=0 tid=0x00007f89080c3800 nid=0x386f waiting on condition

JNI global references: 291

Heap
PSYoungGen      total 61952K, used 4437K [0x00000000fbd80000, 0x0000000100000000, 0x0000000100000000)
  eden space 56832K, 7% used [0x00000000fbd80000,0x00000000fc1d5400,0x00000000ff500000)
  from space 5120K, 0% used [0x00000000ff500000,0x00000000ff500000,0x00000000ffa00000)
  to   space 5120K, 0% used [0x00000000ffb00000,0x00000000ffb00000,0x0000000100000000)
ParOldGen       total 136704K, used 110087K [0x00000000f3800000, 0x00000000fbd80000, 0x00000000fbd80000)
  object space 136704K, 80% used [0x00000000f3800000,0x00000000fa381c90,0x00000000fbd80000)
Metaspace       used 26932K, capacity 27108K, committed 27392K, reserved 1073152K
  class space    used 3183K, capacity 3272K, committed 3328K, reserved 1048576K[/mw_shl_code]

为什么会出现这种错误,求支招啊

已有(2)人评论

跳转到指定楼层
bioger_hit 发表于 2018-7-31 18:46:48
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.111.12:34536 remote=/192.168.111.11:33422]

两者端口互不通信,先测试下端口是否是通的
回复

使用道具 举报

1234567_e8xaE 发表于 2018-8-1 11:04:24
bioger_hit 发表于 2018-7-31 18:46
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be rea ...

问题找到了,还是自己编的代码的问题,自己编的工具类里有个流,放在map里几万条数据需要循环几万次,内存耗费执行不下去了,就超时了。还是感谢!
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条