基本环境:hive是2.1.1,spark是2.1.0,hadoop是2.7.3出错的scala程序如图所示:
web显示就执行到第二行qdRDD=rdd.map.......
不过打印出来了第三行,total partition count,就认为是执行 aggregateBuilder的时候出了问题,调用函数程序如图2所示
这个问题已经折腾了我快一个星期了,求各路大神指教。
我确保我hdfs中的文件权限没有问题,是-rw-r--r-- (644) 只有所有者才有读和写的权限,组群和其他人只有读的权限
修改过755,也还是有报错,现在重新上传了文件,所以数据文件的权限还是644,集群之间机器的相互连接没有问题,谢谢啦
17/07/22 16:07:20 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 176 bytes
17/07/22 16:07:46 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 8, hop51, executor 1): java.lang.IllegalStateException: unread block data
at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:159)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:189)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:186)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/07/22 16:07:46 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 1.0 (TID 9, hop51, executor 1, partition 0, NODE_LOCAL, 6099 bytes)
17/07/22 16:07:47 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 1.
17/07/22 16:07:47 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 1)
17/07/22 16:07:47 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
17/07/22 16:07:47 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, hop51, 56938, None)
17/07/22 16:07:47 INFO storage.BlockManagerMaster: Removed 1 successfully in removeExecutor
17/07/22 16:07:47 INFO scheduler.DAGScheduler: Shuffle files lost for executor: 1 (epoch 1)
17/07/22 16:07:47 INFO scheduler.ShuffleMapStage: ShuffleMapStage 0 is now unavailable on executor 1 (5/8, false)
17/07/22 16:07:47 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1500703381725_0003_01_000002 on host: hop51. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
17/07/22 16:07:47 ERROR cluster.YarnScheduler: Lost executor 1 on hop51: Container marked as failed: container_1500703381725_0003_01_000002 on host: hop51. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
17/07/22 16:07:47 WARN scheduler.TaskSetManager: Lost task 0.1 in stage 1.0 (TID 9, hop51, executor 1): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container marked as failed: container_1500703381725_0003_01_000002 on host: hop51. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
17/07/22 16:07:47 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
17/07/22 16:07:47 INFO storage.BlockManagerMaster: Removal of executor 1 requested
17/07/22 16:07:47 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asked to remove non-existent executor 1
17/07/22 16:07:47 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 1.0 (TID 10, hop33, executor 2, partition 0, NODE_LOCAL, 6099 bytes)
17/07/22 16:07:47 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on hop33:15645 (size: 2.2 KB, free: 366.3 MB)
17/07/22 16:07:47 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 172.16.26.33:37905
17/07/22 16:07:47 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 161 bytes
17/07/22 16:07:47 WARN scheduler.TaskSetManager: Lost task 0.2 in stage 1.0 (TID 10, hop33, executor 2): FetchFailed(null, shuffleId=0, mapId=-1, reduceId=0, message=
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0
at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:697)
at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:693)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$convertMapStatuses(MapOutputTracker.scala:693)
at org.apache.spark.MapOutputTracker.getMapSizesByExecutorId(MapOutputTracker.scala:147)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:49)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:109)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:100)
at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:99)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
)
17/07/22 16:07:47 INFO scheduler.TaskSetManager: Task 0.2 in stage 1.0 (TID 10) failed, but another instance of the task has already succeeded, so not re-queuing the task to be re-executed.
17/07/22 16:07:47 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/07/22 16:07:47 INFO scheduler.DAGScheduler: Marking ResultStage 1 (reduce at QDigestPushDownBuilder.scala:57) as failed due to a fetch failure from ShuffleMapStage 0 (repartition at QDigestPushDownBuilder.scala:31)
17/07/22 16:07:47 INFO scheduler.DAGScheduler: ResultStage 1 (reduce at QDigestPushDownBuilder.scala:57) failed in 27.773 s due to org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0
at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:697)
at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:693)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$convertMapStatuses(MapOutputTracker.scala:693)
at org.apache.spark.MapOutputTracker.getMapSizesByExecutorId(MapOutputTracker.scala:147)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:49)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:109)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:100)
at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:99)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/07/22 16:07:47 INFO scheduler.DAGScheduler: Resubmitting ShuffleMapStage 0 (repartition at QDigestPushDownBuilder.scala:31) and ResultStage 1 (reduce at QDigestPushDownBuilder.scala:57) due to fetch failure
17/07/22 16:07:47 INFO scheduler.DAGScheduler: Resubmitting failed stages
17/07/22 16:07:47 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[6] at repartition at QDigestPushDownBuilder.scala:31), which has no missing parents
17/07/22 16:07:47 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 6.7 KB, free 366.0 MB)
17/07/22 16:07:47 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 3.5 KB, free 366.0 MB)
17/07/22 16:07:47 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 172.16.26.51:48962 (size: 3.5 KB, free: 366.3 MB)
17/07/22 16:07:47 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:996
17/07/22 16:07:47 INFO scheduler.DAGScheduler: Submitting 3 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[6] at repartition at QDigestPushDownBuilder.scala:31)
17/07/22 16:07:47 INFO cluster.YarnScheduler: Adding task set 0.1 with 3 tasks
17/07/22 16:07:47 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.1 (TID 11, hop33, executor 2, partition 0, RACK_LOCAL, 6073 bytes)
17/07/22 16:07:47 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on hop33:15645 (size: 3.5 KB, free: 366.3 MB)
17/07/22 16:07:54 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (172.16.26.33:37925) with ID 3
17/07/22 16:07:54 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.1 (TID 12, hop33, executor 3, partition 1, RACK_LOCAL, 6073 bytes)
17/07/22 16:07:54 INFO storage.BlockManagerMasterEndpoint: Registering block manager hop33:54991 with 366.3 MB RAM, BlockManagerId(3, hop33, 54991, None)
17/07/22 16:08:08 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on hop33:54991 (size: 3.5 KB, free: 366.3 MB)
17/07/22 16:08:08 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hop33:54991 (size: 23.8 KB, free: 366.3 MB)
17/07/22 16:09:17 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on 172.16.26.51:48962 in memory (size: 2.2 KB, free: 366.3 MB)
17/07/22 16:09:17 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on hop33:15645 in memory (size: 2.2 KB, free: 366.3 MB)
17/07/22 16:09:24 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.1 (TID 13, hop33, executor 2, partition 5, RACK_LOCAL, 6073 bytes)
17/07/22 16:09:24 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.1 (TID 11) in 96646 ms on hop33 (executor 2) (1/3)
17/07/22 16:09:57 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.1 (TID 12) in 123006 ms on hop33 (executor 3) (2/3)
17/07/22 16:11:08 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.1 (TID 13) in 104128 ms on hop33 (executor 2) (3/3)
17/07/22 16:11:08 INFO cluster.YarnScheduler: Removed TaskSet 0.1, whose tasks have all completed, from pool
17/07/22 16:11:08 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (repartition at QDigestPushDownBuilder.scala:31) finished in 200.774 s
17/07/22 16:11:08 INFO scheduler.DAGScheduler: looking for newly runnable stages
17/07/22 16:11:08 INFO scheduler.DAGScheduler: running: Set()
17/07/22 16:11:08 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
17/07/22 16:11:08 INFO scheduler.DAGScheduler: failed: Set()
17/07/22 16:11:08 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[10] at map at QDigestPushDownBuilder.scala:56), which has no missing parents
17/07/22 16:11:08 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 3.9 KB, free 366.0 MB)
17/07/22 16:11:08 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 2.2 KB, free 366.0 MB)
17/07/22 16:11:08 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on 172.16.26.51:48962 (size: 2.2 KB, free: 366.3 MB)
17/07/22 16:11:08 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:996
17/07/22 16:11:08 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[10] at map at QDigestPushDownBuilder.scala:56)
17/07/22 16:11:08 INFO cluster.YarnScheduler: Adding task set 1.1 with 1 tasks
17/07/22 16:11:08 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.1 (TID 14, hop33, executor 2, partition 0, NODE_LOCAL, 6099 bytes)
17/07/22 16:11:08 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on hop33:15645 (size: 2.2 KB, free: 366.3 MB)
17/07/22 16:11:08 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 172.16.26.33:37905
17/07/22 16:11:08 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 174 bytes
17/07/22 16:11:27 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.1 (TID 14, hop33, executor 2): java.lang.IllegalStateException: unread block data
at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:159)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:189)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:186)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/07/22 16:11:27 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 1.1 (TID 15, hop33, executor 2, partition 0, NODE_LOCAL, 6099 bytes)
17/07/22 16:11:27 WARN scheduler.TaskSetManager: Lost task 0.1 in stage 1.1 (TID 15, hop33, executor 2): FetchFailed(BlockManagerId(2, hop33, 15645, None), shuffleId=0, mapId=0, reduceId=0, message=
org.apache.spark.shuffle.FetchFailedException: /home/user/spark2/apps/data/tmp/nm-local-dir/usercache/spark2/appcache/application_1500703381725_0003/blockmgr-7eb0a7b9-a7a9-4172-bf1c-eb67d4103d77/30/shuffle_0_0_0.index (No such file or directory)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:357)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /home/user/spark2/apps/data/tmp/nm-local-dir/usercache/spark2/appcache/application_1500703381725_0003/blockmgr-7eb0a7b9-a7a9-4172-bf1c-eb67d4103d77/30/shuffle_0_0_0.index (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:199)
at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:302)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:258)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:292)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.<init>(ShuffleBlockFetcherIterator.scala:120)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:45)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:109)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:100)
at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:99)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
... 18 more
)
17/07/22 16:11:27 INFO scheduler.TaskSetManager: Task 0.1 in stage 1.1 (TID 15) failed, but another instance of the task has already succeeded, so not re-queuing the task to be re-executed.
17/07/22 16:11:27 INFO cluster.YarnScheduler: Removed TaskSet 1.1, whose tasks have all completed, from pool
17/07/22 16:11:27 INFO scheduler.DAGScheduler: Marking ResultStage 1 (reduce at QDigestPushDownBuilder.scala:57) as failed due to a fetch failure from ShuffleMapStage 0 (repartition at QDigestPushDownBuilder.scala:31)
17/07/22 16:11:27 INFO scheduler.DAGScheduler: ResultStage 1 (reduce at QDigestPushDownBuilder.scala:57) failed in 18.613 s due to org.apache.spark.shuffle.FetchFailedException: /home/user/spark2/apps/data/tmp/nm-local-dir/usercache/spark2/appcache/application_1500703381725_0003/blockmgr-7eb0a7b9-a7a9-4172-bf1c-eb67d4103d77/30/shuffle_0_0_0.index (No such file or directory)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:357)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /home/user/spark2/apps/data/tmp/nm-local-dir/usercache/spark2/appcache/application_1500703381725_0003/blockmgr-7eb0a7b9-a7a9-4172-bf1c-eb67d4103d77/30/shuffle_0_0_0.index (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:199)
at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:302)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:258)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:292)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.<init>(ShuffleBlockFetcherIterator.scala:120)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:45)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:109)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:100)
at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:99)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
... 18 more
17/07/22 16:11:27 INFO scheduler.DAGScheduler: Resubmitting ShuffleMapStage 0 (repartition at QDigestPushDownBuilder.scala:31) and ResultStage 1 (reduce at QDigestPushDownBuilder.scala:57) due to fetch failure
17/07/22 16:11:27 INFO scheduler.DAGScheduler: Executor lost: 2 (epoch 3)
17/07/22 16:11:27 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
17/07/22 16:11:27 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, hop33, 15645, None)
17/07/22 16:11:27 INFO storage.BlockManagerMaster: Removed 2 successfully in removeExecutor
17/07/22 16:11:27 INFO scheduler.DAGScheduler: Shuffle files lost for executor: 2 (epoch 3)
17/07/22 16:11:27 INFO scheduler.ShuffleMapStage: ShuffleMapStage 0 is now unavailable on executor 2 (1/8, false)
17/07/22 16:11:27 INFO scheduler.DAGScheduler: Resubmitting failed stages
17/07/22 16:11:27 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[6] at repartition at QDigestPushDownBuilder.scala:31), which has no missing parents
17/07/22 16:11:27 INFO memory.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 6.7 KB, free 366.0 MB)
17/07/22 16:11:27 INFO memory.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 3.5 KB, free 366.0 MB)
17/07/22 16:11:27 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on 172.16.26.51:48962 (size: 3.5 KB, free: 366.3 MB)
17/07/22 16:11:27 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:996
17/07/22 16:11:27 INFO scheduler.DAGScheduler: Submitting 7 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[6] at repartition at QDigestPushDownBuilder.scala:31)
17/07/22 16:11:27 INFO cluster.YarnScheduler: Adding task set 0.2 with 7 tasks
17/07/22 16:11:27 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.2 (TID 16, hop33, executor 2, partition 2, NODE_LOCAL, 6073 bytes)
17/07/22 16:11:27 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.2 (TID 17, hop33, executor 3, partition 6, NODE_LOCAL, 6073 bytes)
17/07/22 16:11:27 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on hop33:54991 (size: 3.5 KB, free: 366.3 MB)
17/07/22 16:11:27 WARN server.TransportChannelHandler: Exception in connection from /172.16.26.33:37905
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:275)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:652)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:575)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:489)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:745)
17/07/22 16:11:27 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 2.
17/07/22 16:11:27 INFO scheduler.DAGScheduler: Executor lost: 2 (epoch 5)
17/07/22 16:11:27 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
17/07/22 16:11:27 INFO storage.BlockManagerMaster: Removed 2 successfully in removeExecutor
17/07/22 16:11:27 INFO scheduler.DAGScheduler: Shuffle files lost for executor: 2 (epoch 5)
17/07/22 16:11:27 ERROR cluster.YarnScheduler: Lost executor 2 on hop33: Container marked as failed: container_1500703381725_0003_01_000003 on host: hop33. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
17/07/22 16:11:27 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1500703381725_0003_01_000003 on host: hop33. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
17/07/22 16:11:27 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 0.2 (TID 16, hop33, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container marked as failed: container_1500703381725_0003_01_000003 on host: hop33. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
17/07/22 16:11:27 INFO storage.BlockManagerMaster: Removal of executor 2 requested
17/07/22 16:11:27 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
17/07/22 16:11:27 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asked to remove non-existent executor 2
17/07/22 16:11:52 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (172.16.26.53:55755) with ID 4
17/07/22 16:11:52 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.2 (TID 18, hop53, executor 4, partition 0, NODE_LOCAL, 6073 bytes)
17/07/22 16:11:52 INFO storage.BlockManagerMasterEndpoint: Registering block manager hop53:33222 with 366.3 MB RAM, BlockManagerId(4, hop53, 33222, None)
17/07/22 16:12:08 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on hop53:33222 (size: 3.5 KB, free: 366.3 MB)
17/07/22 16:12:08 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hop53:33222 (size: 23.8 KB, free: 366.3 MB)
17/07/22 16:12:38 INFO scheduler.TaskSetManager: Starting task 1.1 in stage 0.2 (TID 19, hop33, executor 3, partition 2, NODE_LOCAL, 6073 bytes)
17/07/22 16:12:38 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.2 (TID 17) in 70849 ms on hop33 (executor 3) (1/7)
17/07/22 16:13:48 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.2 (TID 20, hop53, executor 4, partition 3, RACK_LOCAL, 6073 bytes)
17/07/22 16:13:48 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.2 (TID 18) in 116134 ms on hop53 (executor 4) (2/7)
17/07/22 16:13:57 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.2 (TID 21, hop33, executor 3, partition 4, RACK_LOCAL, 6073 bytes)
17/07/22 16:13:57 INFO scheduler.TaskSetManager: Finished task 1.1 in stage 0.2 (TID 19) in 78986 ms on hop33 (executor 3) (3/7)
17/07/22 16:14:14 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.2 (TID 22, hop53, executor 4, partition 5, RACK_LOCAL, 6073 bytes)
17/07/22 16:14:14 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.2 (TID 20) in 26677 ms on hop53 (executor 4) (4/7)
17/07/22 16:15:37 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.2 (TID 23, hop33, executor 3, partition 7, RACK_LOCAL, 6073 bytes)
17/07/22 16:15:37 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.2 (TID 21) in 100473 ms on hop33 (executor 3) (5/7)
17/07/22 16:15:54 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.2 (TID 23) in 16933 ms on hop33 (executor 3) (6/7)
17/07/22 16:16:05 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.2 (TID 22) in 110737 ms on hop53 (executor 4) (7/7)
17/07/22 16:16:05 INFO cluster.YarnScheduler: Removed TaskSet 0.2, whose tasks have all completed, from pool
17/07/22 16:16:05 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (repartition at QDigestPushDownBuilder.scala:31) finished in 278.134 s
17/07/22 16:16:05 INFO scheduler.DAGScheduler: looking for newly runnable stages
17/07/22 16:16:05 INFO scheduler.DAGScheduler: running: Set()
17/07/22 16:16:05 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
17/07/22 16:16:05 INFO scheduler.DAGScheduler: failed: Set()
17/07/22 16:16:05 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[10] at map at QDigestPushDownBuilder.scala:56), which has no missing parents
17/07/22 16:16:05 INFO memory.MemoryStore: Block broadcast_6 stored as values in memory (estimated size 3.9 KB, free 366.0 MB)
17/07/22 16:16:05 INFO memory.MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 2.2 KB, free 366.0 MB)
17/07/22 16:16:05 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on 172.16.26.51:48962 (size: 2.2 KB, free: 366.3 MB)
17/07/22 16:16:05 INFO spark.SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:996
17/07/22 16:16:05 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[10] at map at QDigestPushDownBuilder.scala:56)
17/07/22 16:16:05 INFO cluster.YarnScheduler: Adding task set 1.2 with 1 tasks
17/07/22 16:16:05 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.2 (TID 24, hop53, executor 4, partition 0, NODE_LOCAL, 6099 bytes)
17/07/22 16:16:05 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on hop53:33222 (size: 2.2 KB, free: 366.3 MB)
17/07/22 16:16:05 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 172.16.26.53:55755
17/07/22 16:16:05 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 178 bytes
17/07/22 16:16:27 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.2 (TID 24, hop53, executor 4): java.lang.IllegalStateException: unread block data
at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:159)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:189)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:186)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/07/22 16:16:27 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 1.2 (TID 25, hop53, executor 4, partition 0, NODE_LOCAL, 6099 bytes)
17/07/22 16:16:28 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 4.
17/07/22 16:16:28 INFO scheduler.DAGScheduler: Executor lost: 4 (epoch 7)
17/07/22 16:16:28 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 4 from BlockManagerMaster.
17/07/22 16:16:28 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(4, hop53, 33222, None)
17/07/22 16:16:28 INFO storage.BlockManagerMaster: Removed 4 successfully in removeExecutor
17/07/22 16:16:28 INFO scheduler.DAGScheduler: Shuffle files lost for executor: 4 (epoch 7)
17/07/22 16:16:28 INFO scheduler.ShuffleMapStage: ShuffleMapStage 0 is now unavailable on executor 4 (5/8, false)
17/07/22 16:16:28 ERROR cluster.YarnScheduler: Lost executor 4 on hop53: Container marked as failed: container_1500703381725_0003_01_000005 on host: hop53. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
17/07/22 16:16:28 WARN scheduler.TaskSetManager: Lost task 0.1 in stage 1.2 (TID 25, hop53, executor 4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Container marked as failed: container_1500703381725_0003_01_000005 on host: hop53. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
17/07/22 16:16:28 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1500703381725_0003_01_000005 on host: hop53. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
17/07/22 16:16:28 INFO storage.BlockManagerMaster: Removal of executor 4 requested
17/07/22 16:16:28 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 4 from BlockManagerMaster.
17/07/22 16:16:28 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asked to remove non-existent executor 4
17/07/22 16:16:28 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 1.2 (TID 26, hop33, executor 3, partition 0, NODE_LOCAL, 6099 bytes)
17/07/22 16:16:28 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on hop33:54991 (size: 2.2 KB, free: 366.3 MB)
17/07/22 16:16:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 172.16.26.33:37925
17/07/22 16:16:28 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 160 bytes
17/07/22 16:16:28 WARN scheduler.TaskSetManager: Lost task 0.2 in stage 1.2 (TID 26, hop33, executor 3): FetchFailed(null, shuffleId=0, mapId=-1, reduceId=0, message=
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0
at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:697)
at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:693)
)
17/07/22 16:16:28 INFO scheduler.TaskSetManager: Task 0.2 in stage 1.2 (TID 26) failed, but another instance of the task has already succeeded, so not re-queuing the task to be re-executed.
17/07/22 16:16:28 INFO scheduler.DAGScheduler: Marking ResultStage 1 (reduce at QDigestPushDownBuilder.scala:57) as failed due to a fetch failure from ShuffleMapStage 0 (repartition at QDigestPushDownBuilder.scala:31)
17/07/22 16:16:28 INFO cluster.YarnScheduler: Removed TaskSet 1.2, whose tasks have all completed, from pool
17/07/22 16:16:28 INFO scheduler.DAGScheduler: ResultStage 1 (reduce at QDigestPushDownBuilder.scala:57) failed in 23.025 s due to org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0
|