spark和flume整合通过拉模式得到的结果是 17/09/15 16:55:58 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 17/09/15 16:55:58 INFO scheduler.DAGScheduler: ResultStage 1 (start at FlumeLogPull.scala:47) finished in 1.139 s 17/09/15 16:55:58 INFO scheduler.DAGScheduler: Job 0 finished: start at FlumeLogPull.scala:47, took 21.803352 s 17/09/15 16:55:58 INFO scheduler.ReceiverTracker: Starting 1 receivers 17/09/15 16:55:58 INFO scheduler.ReceiverTracker: ReceiverTracker started 17/09/15 16:55:58 INFO flume.FlumePollingInputDStream: Slide time = 30000 ms 17/09/15 16:55:58 INFO flume.FlumePollingInputDStream: Storage level = Serialized 1x Replicated 17/09/15 16:55:58 INFO flume.FlumePollingInputDStream: Checkpoint interval = null 17/09/15 16:55:58 INFO flume.FlumePollingInputDStream: Remember interval = 30000 ms 17/09/15 16:55:58 INFO flume.FlumePollingInputDStream: Initialized and validated org.apache.spark.streaming.flume.FlumePollingInputDStream@3c6c8b93 17/09/15 16:55:58 INFO dstream.MappedDStream: Slide time = 30000 ms 17/09/15 16:55:58 INFO dstream.MappedDStream: Storage level = Serialized 1x Replicated 17/09/15 16:55:58 INFO dstream.MappedDStream: Checkpoint interval = null 17/09/15 16:55:58 INFO dstream.MappedDStream: Remember interval = 30000 ms 17/09/15 16:55:58 INFO dstream.MappedDStream: Initialized and validated org.apache.spark.streaming.dstream.MappedDStream@57a527ca 17/09/15 16:55:58 INFO dstream.TransformedDStream: Slide time = 30000 ms 17/09/15 16:55:58 INFO dstream.TransformedDStream: Storage level = Serialized 1x Replicated 17/09/15 16:55:58 INFO dstream.TransformedDStream: Checkpoint interval = null 17/09/15 16:55:58 INFO dstream.TransformedDStream: Remember interval = 30000 ms 17/09/15 16:55:58 INFO dstream.TransformedDStream: Initialized and validated org.apache.spark.streaming.dstream.TransformedDStream@556e827b 17/09/15 16:55:58 INFO dstream.ShuffledDStream: Slide time = 30000 ms 17/09/15 16:55:58 INFO dstream.ShuffledDStream: Storage level = Serialized 1x Replicated 17/09/15 16:55:58 INFO dstream.ShuffledDStream: Checkpoint interval = null 17/09/15 16:55:58 INFO dstream.ShuffledDStream: Remember interval = 30000 ms 17/09/15 16:55:58 INFO dstream.ShuffledDStream: Initialized and validated org.apache.spark.streaming.dstream.ShuffledDStream@288946aa 17/09/15 16:55:58 INFO dstream.MappedDStream: Slide time = 30000 ms 17/09/15 16:55:58 INFO dstream.MappedDStream: Storage level = Serialized 1x Replicated 17/09/15 16:55:58 INFO dstream.MappedDStream: Checkpoint interval = null 17/09/15 16:55:58 INFO dstream.MappedDStream: Remember interval = 30000 ms 17/09/15 16:55:58 INFO dstream.MappedDStream: Initialized and validated org.apache.spark.streaming.dstream.MappedDStream@7061703b 17/09/15 16:55:58 INFO dstream.MappedDStream: Slide time = 30000 ms 17/09/15 16:55:58 INFO dstream.MappedDStream: Storage level = Serialized 1x Replicated 17/09/15 16:55:58 INFO dstream.MappedDStream: Checkpoint interval = null 17/09/15 16:55:58 INFO dstream.MappedDStream: Remember interval = 30000 ms 17/09/15 16:55:58 INFO dstream.MappedDStream: Initialized and validated org.apache.spark.streaming.dstream.MappedDStream@e84928a 17/09/15 16:55:58 INFO dstream.ForEachDStream: Slide time = 30000 ms 17/09/15 16:55:58 INFO dstream.ForEachDStream: Storage level = Serialized 1x Replicated 17/09/15 16:55:58 INFO dstream.ForEachDStream: Checkpoint interval = null 17/09/15 16:55:58 INFO dstream.ForEachDStream: Remember interval = 30000 ms 17/09/15 16:55:58 INFO dstream.ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream@2f58b47b 17/09/15 16:55:59 INFO scheduler.DAGScheduler: Got job 1 (start at FlumeLogPull.scala:47) with 1 output partitions 17/09/15 16:55:59 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (start at FlumeLogPull.scala:47) 17/09/15 16:55:59 INFO scheduler.DAGScheduler: Parents of final stage: List() 17/09/15 16:55:59 INFO scheduler.DAGScheduler: Missing parents: List() 17/09/15 16:55:59 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (Receiver 0 ParallelCollectionRDD[3] at makeRDD at ReceiverTracker.scala:610), which has no missing parents 17/09/15 16:55:59 INFO scheduler.ReceiverTracker: Receiver 0 started 17/09/15 16:55:59 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 70.6 KB, free 413.8 MB) 17/09/15 16:55:59 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 25.1 KB, free 413.8 MB) 17/09/15 16:55:59 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.28.41.193:34906 (size: 25.1 KB, free: 413.9 MB) 17/09/15 16:55:59 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1012 17/09/15 16:55:59 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (Receiver 0 ParallelCollectionRDD[3] at makeRDD at ReceiverTracker.scala:610) 17/09/15 16:55:59 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 1 tasks 17/09/15 16:55:59 INFO util.RecurringTimer: Started timer for JobGenerator at time 1505465760000 17/09/15 16:55:59 INFO scheduler.JobGenerator: Started JobGenerator at 1505465760000 ms 17/09/15 16:55:59 INFO scheduler.JobScheduler: Started JobScheduler 17/09/15 16:56:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@534e58b6{/streaming,null,AVAILABLE} 17/09/15 16:56:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1b495d4{/streaming/json,null,AVAILABLE} 17/09/15 16:56:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@12fe1f28{/streaming/batch,null,AVAILABLE} 17/09/15 16:56:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@26fb4d06{/streaming/batch/json,null,AVAILABLE} 17/09/15 16:56:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2d38edfd{/static/streaming,null,AVAILABLE} 17/09/15 16:56:00 INFO streaming.StreamingContext: StreamingContext started 17/09/15 16:56:00 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 70, 172.28.41.196, partition 0, PROCESS_LOCAL, 7271 bytes) 17/09/15 16:56:00 INFO cluster.CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 70 on executor id: 0 hostname: 172.28.41.196. 17/09/15 16:56:00 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.28.41.196:48603 (size: 25.1 KB, free: 413.9 MB) 17/09/15 16:56:01 INFO scheduler.ReceiverTracker: Registered receiver for stream 0 from 172.28.41.196:33071 17/09/15 16:56:01 INFO scheduler.JobScheduler: Added jobs for time 1505465760000 ms 17/09/15 16:56:01 INFO scheduler.JobScheduler: Starting job streaming job 1505465760000 ms.0 from job set of time 1505465760000 ms 17/09/15 16:56:01 INFO spark.SparkContext: Starting job: print at FlumeLogPull.scala:43 17/09/15 16:56:01 INFO scheduler.DAGScheduler: Registering RDD 7 (union at DStream.scala:605) 17/09/15 16:56:01 INFO scheduler.DAGScheduler: Got job 2 (print at FlumeLogPull.scala:43) with 1 output partitions 17/09/15 16:56:01 INFO scheduler.DAGScheduler: Final stage: ResultStage 4 (print at FlumeLogPull.scala:43) 17/09/15 16:56:01 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 3) 17/09/15 16:56:01 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 3) 17/09/15 16:56:01 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 3 (UnionRDD[7] at union at DStream.scala:605), which has no missing parents 17/09/15 16:56:01 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 3.3 KB, free 413.8 MB) 17/09/15 16:56:01 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on 172.28.41.193:34906 in memory (size: 1969.0 B, free: 413.9 MB) 17/09/15 16:56:01 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 2.0 KB, free 413.8 MB) 17/09/15 16:56:01 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 172.28.41.193:34906 (size: 2.0 KB, free: 413.9 MB) 17/09/15 16:56:01 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1012 17/09/15 16:56:01 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 3 (UnionRDD[7] at union at DStream.scala:605) 17/09/15 16:56:01 INFO scheduler.TaskSchedulerImpl: Adding task set 3.0 with 1 tasks 17/09/15 16:56:02 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on 172.28.41.196:48603 in memory (size: 1969.0 B, free: 413.9 MB) 17/09/15 16:56:30 INFO scheduler.JobScheduler: Added jobs for time 1505465790000 ms 17/09/15 16:57:00 INFO scheduler.JobScheduler: Added jobs for time 1505465820000 ms 17/09/15 16:57:30 INFO scheduler.JobScheduler: Added jobs for time 1505465850000 ms 没有输出对应的输出语句是什么原因 |
感谢分享 |