分享

提交spark执行scala程序报错unread block data ,求指教

基本环境:hive是2.1.1,spark是2.1.0,hadoop是2.7.3出错的scala程序如图所示:
web显示就执行到第二行qdRDD=rdd.map.......
不过打印出来了第三行,total partition count,就认为是执行 aggregateBuilder的时候出了问题,调用函数程序如图2所示
这个问题已经折腾了我快一个星期了,求各路大神指教。
我确保我hdfs中的文件权限没有问题,是-rw-r--r--   (644) 只有所有者才有读和写的权限,组群和其他人只有读的权限
修改过755,也还是有报错,现在重新上传了文件,所以数据文件的权限还是644,集群之间机器的相互连接没有问题,谢谢啦
17/07/22 16:07:20 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 176 bytes
17/07/22 16:07:46 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 8, hop51, executor 1): java.lang.IllegalStateException: unread block data
        at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
        at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:159)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:189)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:186)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
        at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
        at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
        at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
        at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
        at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

17/07/22 16:07:46 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 1.0 (TID 9, hop51, executor 1, partition 0, NODE_LOCAL, 6099 bytes)
17/07/22 16:07:47 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 1.
17/07/22 16:07:47 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 1)
17/07/22 16:07:47 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
17/07/22 16:07:47 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, hop51, 56938, None)
17/07/22 16:07:47 INFO storage.BlockManagerMaster: Removed 1 successfully in removeExecutor
17/07/22 16:07:47 INFO scheduler.DAGScheduler: Shuffle files lost for executor: 1 (epoch 1)
17/07/22 16:07:47 INFO scheduler.ShuffleMapStage: ShuffleMapStage 0 is now unavailable on executor 1 (5/8, false)
17/07/22 16:07:47 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1500703381725_0003_01_000002 on host: hop51. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

17/07/22 16:07:47 ERROR cluster.YarnScheduler: Lost executor 1 on hop51: Container marked as failed: container_1500703381725_0003_01_000002 on host: hop51. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

17/07/22 16:07:47 WARN scheduler.TaskSetManager: Lost task 0.1 in stage 1.0 (TID 9, hop51, executor 1): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container marked as failed: container_1500703381725_0003_01_000002 on host: hop51. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

17/07/22 16:07:47 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
17/07/22 16:07:47 INFO storage.BlockManagerMaster: Removal of executor 1 requested
17/07/22 16:07:47 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asked to remove non-existent executor 1
17/07/22 16:07:47 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 1.0 (TID 10, hop33, executor 2, partition 0, NODE_LOCAL, 6099 bytes)
17/07/22 16:07:47 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on hop33:15645 (size: 2.2 KB, free: 366.3 MB)
17/07/22 16:07:47 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 172.16.26.33:37905
17/07/22 16:07:47 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 161 bytes
17/07/22 16:07:47 WARN scheduler.TaskSetManager: Lost task 0.2 in stage 1.0 (TID 10, hop33, executor 2): FetchFailed(null, shuffleId=0, mapId=-1, reduceId=0, message=
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0
        at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:697)
        at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:693)
        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
        at org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$convertMapStatuses(MapOutputTracker.scala:693)
        at org.apache.spark.MapOutputTracker.getMapSizesByExecutorId(MapOutputTracker.scala:147)
        at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:49)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:109)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:100)
        at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:99)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
        at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
        at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
        at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
        at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
        at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

)
17/07/22 16:07:47 INFO scheduler.TaskSetManager: Task 0.2 in stage 1.0 (TID 10) failed, but another instance of the task has already succeeded, so not re-queuing the task to be re-executed.
17/07/22 16:07:47 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/07/22 16:07:47 INFO scheduler.DAGScheduler: Marking ResultStage 1 (reduce at QDigestPushDownBuilder.scala:57) as failed due to a fetch failure from ShuffleMapStage 0 (repartition at QDigestPushDownBuilder.scala:31)
17/07/22 16:07:47 INFO scheduler.DAGScheduler: ResultStage 1 (reduce at QDigestPushDownBuilder.scala:57) failed in 27.773 s due to org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0
        at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:697)
        at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:693)
        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
        at org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$convertMapStatuses(MapOutputTracker.scala:693)
        at org.apache.spark.MapOutputTracker.getMapSizesByExecutorId(MapOutputTracker.scala:147)
        at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:49)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:109)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:100)
        at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:99)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
        at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
        at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
        at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
        at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
        at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

17/07/22 16:07:47 INFO scheduler.DAGScheduler: Resubmitting ShuffleMapStage 0 (repartition at QDigestPushDownBuilder.scala:31) and ResultStage 1 (reduce at QDigestPushDownBuilder.scala:57) due to fetch failure
17/07/22 16:07:47 INFO scheduler.DAGScheduler: Resubmitting failed stages
17/07/22 16:07:47 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[6] at repartition at QDigestPushDownBuilder.scala:31), which has no missing parents
17/07/22 16:07:47 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 6.7 KB, free 366.0 MB)
17/07/22 16:07:47 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 3.5 KB, free 366.0 MB)
17/07/22 16:07:47 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 172.16.26.51:48962 (size: 3.5 KB, free: 366.3 MB)
17/07/22 16:07:47 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:996
17/07/22 16:07:47 INFO scheduler.DAGScheduler: Submitting 3 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[6] at repartition at QDigestPushDownBuilder.scala:31)
17/07/22 16:07:47 INFO cluster.YarnScheduler: Adding task set 0.1 with 3 tasks
17/07/22 16:07:47 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.1 (TID 11, hop33, executor 2, partition 0, RACK_LOCAL, 6073 bytes)
17/07/22 16:07:47 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on hop33:15645 (size: 3.5 KB, free: 366.3 MB)
17/07/22 16:07:54 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (172.16.26.33:37925) with ID 3
17/07/22 16:07:54 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.1 (TID 12, hop33, executor 3, partition 1, RACK_LOCAL, 6073 bytes)
17/07/22 16:07:54 INFO storage.BlockManagerMasterEndpoint: Registering block manager hop33:54991 with 366.3 MB RAM, BlockManagerId(3, hop33, 54991, None)
17/07/22 16:08:08 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on hop33:54991 (size: 3.5 KB, free: 366.3 MB)
17/07/22 16:08:08 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hop33:54991 (size: 23.8 KB, free: 366.3 MB)
17/07/22 16:09:17 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on 172.16.26.51:48962 in memory (size: 2.2 KB, free: 366.3 MB)
17/07/22 16:09:17 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on hop33:15645 in memory (size: 2.2 KB, free: 366.3 MB)
17/07/22 16:09:24 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.1 (TID 13, hop33, executor 2, partition 5, RACK_LOCAL, 6073 bytes)
17/07/22 16:09:24 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.1 (TID 11) in 96646 ms on hop33 (executor 2) (1/3)
17/07/22 16:09:57 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.1 (TID 12) in 123006 ms on hop33 (executor 3) (2/3)
17/07/22 16:11:08 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.1 (TID 13) in 104128 ms on hop33 (executor 2) (3/3)
17/07/22 16:11:08 INFO cluster.YarnScheduler: Removed TaskSet 0.1, whose tasks have all completed, from pool
17/07/22 16:11:08 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (repartition at QDigestPushDownBuilder.scala:31) finished in 200.774 s
17/07/22 16:11:08 INFO scheduler.DAGScheduler: looking for newly runnable stages
17/07/22 16:11:08 INFO scheduler.DAGScheduler: running: Set()
17/07/22 16:11:08 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
17/07/22 16:11:08 INFO scheduler.DAGScheduler: failed: Set()
17/07/22 16:11:08 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[10] at map at QDigestPushDownBuilder.scala:56), which has no missing parents
17/07/22 16:11:08 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 3.9 KB, free 366.0 MB)
17/07/22 16:11:08 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 2.2 KB, free 366.0 MB)
17/07/22 16:11:08 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on 172.16.26.51:48962 (size: 2.2 KB, free: 366.3 MB)
17/07/22 16:11:08 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:996
17/07/22 16:11:08 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[10] at map at QDigestPushDownBuilder.scala:56)
17/07/22 16:11:08 INFO cluster.YarnScheduler: Adding task set 1.1 with 1 tasks
17/07/22 16:11:08 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.1 (TID 14, hop33, executor 2, partition 0, NODE_LOCAL, 6099 bytes)
17/07/22 16:11:08 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on hop33:15645 (size: 2.2 KB, free: 366.3 MB)
17/07/22 16:11:08 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 172.16.26.33:37905
17/07/22 16:11:08 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 174 bytes
17/07/22 16:11:27 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.1 (TID 14, hop33, executor 2): java.lang.IllegalStateException: unread block data
        at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
        at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:159)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:189)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:186)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
        at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
        at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
        at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
        at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
        at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

17/07/22 16:11:27 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 1.1 (TID 15, hop33, executor 2, partition 0, NODE_LOCAL, 6099 bytes)
17/07/22 16:11:27 WARN scheduler.TaskSetManager: Lost task 0.1 in stage 1.1 (TID 15, hop33, executor 2): FetchFailed(BlockManagerId(2, hop33, 15645, None), shuffleId=0, mapId=0, reduceId=0, message=
org.apache.spark.shuffle.FetchFailedException: /home/user/spark2/apps/data/tmp/nm-local-dir/usercache/spark2/appcache/application_1500703381725_0003/blockmgr-7eb0a7b9-a7a9-4172-bf1c-eb67d4103d77/30/shuffle_0_0_0.index (No such file or directory)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:357)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
        at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
        at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
        at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
        at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
        at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /home/user/spark2/apps/data/tmp/nm-local-dir/usercache/spark2/appcache/application_1500703381725_0003/blockmgr-7eb0a7b9-a7a9-4172-bf1c-eb67d4103d77/30/shuffle_0_0_0.index (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:199)
        at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:302)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:258)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:292)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.<init>(ShuffleBlockFetcherIterator.scala:120)
        at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:45)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:109)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:100)
        at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:99)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        ... 18 more

)
17/07/22 16:11:27 INFO scheduler.TaskSetManager: Task 0.1 in stage 1.1 (TID 15) failed, but another instance of the task has already succeeded, so not re-queuing the task to be re-executed.
17/07/22 16:11:27 INFO cluster.YarnScheduler: Removed TaskSet 1.1, whose tasks have all completed, from pool
17/07/22 16:11:27 INFO scheduler.DAGScheduler: Marking ResultStage 1 (reduce at QDigestPushDownBuilder.scala:57) as failed due to a fetch failure from ShuffleMapStage 0 (repartition at QDigestPushDownBuilder.scala:31)
17/07/22 16:11:27 INFO scheduler.DAGScheduler: ResultStage 1 (reduce at QDigestPushDownBuilder.scala:57) failed in 18.613 s due to org.apache.spark.shuffle.FetchFailedException: /home/user/spark2/apps/data/tmp/nm-local-dir/usercache/spark2/appcache/application_1500703381725_0003/blockmgr-7eb0a7b9-a7a9-4172-bf1c-eb67d4103d77/30/shuffle_0_0_0.index (No such file or directory)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:357)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
        at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
        at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
        at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
        at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
        at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /home/user/spark2/apps/data/tmp/nm-local-dir/usercache/spark2/appcache/application_1500703381725_0003/blockmgr-7eb0a7b9-a7a9-4172-bf1c-eb67d4103d77/30/shuffle_0_0_0.index (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:199)
        at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:302)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:258)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:292)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.<init>(ShuffleBlockFetcherIterator.scala:120)
        at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:45)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:109)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:100)
        at org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:99)
        at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
        ... 18 more

17/07/22 16:11:27 INFO scheduler.DAGScheduler: Resubmitting ShuffleMapStage 0 (repartition at QDigestPushDownBuilder.scala:31) and ResultStage 1 (reduce at QDigestPushDownBuilder.scala:57) due to fetch failure
17/07/22 16:11:27 INFO scheduler.DAGScheduler: Executor lost: 2 (epoch 3)
17/07/22 16:11:27 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
17/07/22 16:11:27 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, hop33, 15645, None)
17/07/22 16:11:27 INFO storage.BlockManagerMaster: Removed 2 successfully in removeExecutor
17/07/22 16:11:27 INFO scheduler.DAGScheduler: Shuffle files lost for executor: 2 (epoch 3)
17/07/22 16:11:27 INFO scheduler.ShuffleMapStage: ShuffleMapStage 0 is now unavailable on executor 2 (1/8, false)
17/07/22 16:11:27 INFO scheduler.DAGScheduler: Resubmitting failed stages
17/07/22 16:11:27 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[6] at repartition at QDigestPushDownBuilder.scala:31), which has no missing parents
17/07/22 16:11:27 INFO memory.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 6.7 KB, free 366.0 MB)
17/07/22 16:11:27 INFO memory.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 3.5 KB, free 366.0 MB)
17/07/22 16:11:27 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on 172.16.26.51:48962 (size: 3.5 KB, free: 366.3 MB)
17/07/22 16:11:27 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:996
17/07/22 16:11:27 INFO scheduler.DAGScheduler: Submitting 7 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[6] at repartition at QDigestPushDownBuilder.scala:31)
17/07/22 16:11:27 INFO cluster.YarnScheduler: Adding task set 0.2 with 7 tasks
17/07/22 16:11:27 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.2 (TID 16, hop33, executor 2, partition 2, NODE_LOCAL, 6073 bytes)
17/07/22 16:11:27 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.2 (TID 17, hop33, executor 3, partition 6, NODE_LOCAL, 6073 bytes)
17/07/22 16:11:27 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on hop33:54991 (size: 3.5 KB, free: 366.3 MB)
17/07/22 16:11:27 WARN server.TransportChannelHandler: Exception in connection from /172.16.26.33:37905
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:275)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:652)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:575)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:489)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
        at java.lang.Thread.run(Thread.java:745)
17/07/22 16:11:27 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 2.
17/07/22 16:11:27 INFO scheduler.DAGScheduler: Executor lost: 2 (epoch 5)
17/07/22 16:11:27 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
17/07/22 16:11:27 INFO storage.BlockManagerMaster: Removed 2 successfully in removeExecutor
17/07/22 16:11:27 INFO scheduler.DAGScheduler: Shuffle files lost for executor: 2 (epoch 5)
17/07/22 16:11:27 ERROR cluster.YarnScheduler: Lost executor 2 on hop33: Container marked as failed: container_1500703381725_0003_01_000003 on host: hop33. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

17/07/22 16:11:27 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1500703381725_0003_01_000003 on host: hop33. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

17/07/22 16:11:27 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 0.2 (TID 16, hop33, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container marked as failed: container_1500703381725_0003_01_000003 on host: hop33. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

17/07/22 16:11:27 INFO storage.BlockManagerMaster: Removal of executor 2 requested
17/07/22 16:11:27 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
17/07/22 16:11:27 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asked to remove non-existent executor 2
17/07/22 16:11:52 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (172.16.26.53:55755) with ID 4
17/07/22 16:11:52 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.2 (TID 18, hop53, executor 4, partition 0, NODE_LOCAL, 6073 bytes)
17/07/22 16:11:52 INFO storage.BlockManagerMasterEndpoint: Registering block manager hop53:33222 with 366.3 MB RAM, BlockManagerId(4, hop53, 33222, None)
17/07/22 16:12:08 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on hop53:33222 (size: 3.5 KB, free: 366.3 MB)
17/07/22 16:12:08 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hop53:33222 (size: 23.8 KB, free: 366.3 MB)
17/07/22 16:12:38 INFO scheduler.TaskSetManager: Starting task 1.1 in stage 0.2 (TID 19, hop33, executor 3, partition 2, NODE_LOCAL, 6073 bytes)
17/07/22 16:12:38 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.2 (TID 17) in 70849 ms on hop33 (executor 3) (1/7)
17/07/22 16:13:48 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.2 (TID 20, hop53, executor 4, partition 3, RACK_LOCAL, 6073 bytes)
17/07/22 16:13:48 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.2 (TID 18) in 116134 ms on hop53 (executor 4) (2/7)
17/07/22 16:13:57 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.2 (TID 21, hop33, executor 3, partition 4, RACK_LOCAL, 6073 bytes)
17/07/22 16:13:57 INFO scheduler.TaskSetManager: Finished task 1.1 in stage 0.2 (TID 19) in 78986 ms on hop33 (executor 3) (3/7)
17/07/22 16:14:14 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.2 (TID 22, hop53, executor 4, partition 5, RACK_LOCAL, 6073 bytes)
17/07/22 16:14:14 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.2 (TID 20) in 26677 ms on hop53 (executor 4) (4/7)
17/07/22 16:15:37 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.2 (TID 23, hop33, executor 3, partition 7, RACK_LOCAL, 6073 bytes)
17/07/22 16:15:37 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.2 (TID 21) in 100473 ms on hop33 (executor 3) (5/7)
17/07/22 16:15:54 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.2 (TID 23) in 16933 ms on hop33 (executor 3) (6/7)
17/07/22 16:16:05 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.2 (TID 22) in 110737 ms on hop53 (executor 4) (7/7)
17/07/22 16:16:05 INFO cluster.YarnScheduler: Removed TaskSet 0.2, whose tasks have all completed, from pool
17/07/22 16:16:05 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (repartition at QDigestPushDownBuilder.scala:31) finished in 278.134 s
17/07/22 16:16:05 INFO scheduler.DAGScheduler: looking for newly runnable stages
17/07/22 16:16:05 INFO scheduler.DAGScheduler: running: Set()
17/07/22 16:16:05 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
17/07/22 16:16:05 INFO scheduler.DAGScheduler: failed: Set()
17/07/22 16:16:05 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[10] at map at QDigestPushDownBuilder.scala:56), which has no missing parents
17/07/22 16:16:05 INFO memory.MemoryStore: Block broadcast_6 stored as values in memory (estimated size 3.9 KB, free 366.0 MB)
17/07/22 16:16:05 INFO memory.MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 2.2 KB, free 366.0 MB)
17/07/22 16:16:05 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on 172.16.26.51:48962 (size: 2.2 KB, free: 366.3 MB)
17/07/22 16:16:05 INFO spark.SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:996
17/07/22 16:16:05 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[10] at map at QDigestPushDownBuilder.scala:56)
17/07/22 16:16:05 INFO cluster.YarnScheduler: Adding task set 1.2 with 1 tasks
17/07/22 16:16:05 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.2 (TID 24, hop53, executor 4, partition 0, NODE_LOCAL, 6099 bytes)
17/07/22 16:16:05 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on hop53:33222 (size: 2.2 KB, free: 366.3 MB)
17/07/22 16:16:05 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 172.16.26.53:55755
17/07/22 16:16:05 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 178 bytes
17/07/22 16:16:27 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.2 (TID 24, hop53, executor 4): java.lang.IllegalStateException: unread block data
        at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
        at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:159)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:189)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:186)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
        at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
        at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
        at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
        at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
        at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
        at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

17/07/22 16:16:27 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 1.2 (TID 25, hop53, executor 4, partition 0, NODE_LOCAL, 6099 bytes)
17/07/22 16:16:28 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 4.
17/07/22 16:16:28 INFO scheduler.DAGScheduler: Executor lost: 4 (epoch 7)
17/07/22 16:16:28 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 4 from BlockManagerMaster.
17/07/22 16:16:28 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(4, hop53, 33222, None)
17/07/22 16:16:28 INFO storage.BlockManagerMaster: Removed 4 successfully in removeExecutor
17/07/22 16:16:28 INFO scheduler.DAGScheduler: Shuffle files lost for executor: 4 (epoch 7)
17/07/22 16:16:28 INFO scheduler.ShuffleMapStage: ShuffleMapStage 0 is now unavailable on executor 4 (5/8, false)
17/07/22 16:16:28 ERROR cluster.YarnScheduler: Lost executor 4 on hop53: Container marked as failed: container_1500703381725_0003_01_000005 on host: hop53. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

17/07/22 16:16:28 WARN scheduler.TaskSetManager: Lost task 0.1 in stage 1.2 (TID 25, hop53, executor 4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Container marked as failed: container_1500703381725_0003_01_000005 on host: hop53. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

17/07/22 16:16:28 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1500703381725_0003_01_000005 on host: hop53. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

17/07/22 16:16:28 INFO storage.BlockManagerMaster: Removal of executor 4 requested
17/07/22 16:16:28 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 4 from BlockManagerMaster.
17/07/22 16:16:28 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asked to remove non-existent executor 4
17/07/22 16:16:28 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 1.2 (TID 26, hop33, executor 3, partition 0, NODE_LOCAL, 6099 bytes)
17/07/22 16:16:28 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on hop33:54991 (size: 2.2 KB, free: 366.3 MB)
17/07/22 16:16:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 172.16.26.33:37925
17/07/22 16:16:28 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 160 bytes
17/07/22 16:16:28 WARN scheduler.TaskSetManager: Lost task 0.2 in stage 1.2 (TID 26, hop33, executor 3): FetchFailed(null, shuffleId=0, mapId=-1, reduceId=0, message=
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0
        at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:697)
        at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:693)
       

)
17/07/22 16:16:28 INFO scheduler.TaskSetManager: Task 0.2 in stage 1.2 (TID 26) failed, but another instance of the task has already succeeded, so not re-queuing the task to be re-executed.
17/07/22 16:16:28 INFO scheduler.DAGScheduler: Marking ResultStage 1 (reduce at QDigestPushDownBuilder.scala:57) as failed due to a fetch failure from ShuffleMapStage 0 (repartition at QDigestPushDownBuilder.scala:31)
17/07/22 16:16:28 INFO cluster.YarnScheduler: Removed TaskSet 1.2, whose tasks have all completed, from pool
17/07/22 16:16:28 INFO scheduler.DAGScheduler: ResultStage 1 (reduce at QDigestPushDownBuilder.scala:57) failed in 23.025 s due to org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0
       







图片1.png
图片2.png

已有(3)人评论

跳转到指定楼层
yangyixin 发表于 2017-7-22 16:36:09
刚才突然发现。。400m数据能跑通,然后加到1g就废了,是不是设置的问题?
回复

使用道具 举报

starrycheng 发表于 2017-7-22 17:03:32
yangyixin 发表于 2017-7-22 16:36
刚才突然发现。。400m数据能跑通,然后加到1g就废了,是不是设置的问题?

如何提交的程序,可以列出相关内容来。
建议增大提交的时候的相关参数。
比如:
spark.executor.extraJavaOptions
driver-java-options -XX:MaxPermSize
driver-memory
executor-memory
上都可以增大
例子:
./spark-submit --class com.xyz.MySpark --conf "spark.executor.extraJavaOptions=-XX:MaxPermSize=1024M" --driver-java-options -XX:MaxPermSize=1024m --driver-memory 4g --master yarn-client --executor-memory 2G --executor-cores 8 --num-executors 15  /home/myuser/myspark-1.0.jar

回复

使用道具 举报

yangyixin 发表于 2017-7-22 18:15:39
starrycheng 发表于 2017-7-22 17:03
如何提交的程序,可以列出相关内容来。
建议增大提交的时候的相关参数。
比如:

我调整了参数以后果然能过执行了,谢谢您啦
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条