分享

Mahout分步式程序开发 基于物品的协同过滤ItemCF

本帖最后由 52Pig 于 2014-10-26 14:10 编辑
阅读导读:
1.简述用Mahout实现协同过滤ItemCF的步骤?
2.如何用API实现Hadoop的各种HDFS命令?
3.Kmeans.java类报错,暂时可以怎么处理?





1. Mahout开发环境介绍
  在用Maven构建Mahout项目文章中,我们已经配置好了基于Maven的Mahout的开发环境,我们将继续完成Mahout的分步式的程序开发。
  本文的mahout版本为0.8。
  开发环境:
  • Win7 64bit
  • Java 1.6.0_45
  • Maven 3
  • Eclipse Juno Service Release 2
  • Mahout 0.8
  • Hadoop 1.1.2
  找到pom.xml,修改mahout版本为0.8
  1. <mahout.version>0.8</mahout.version>
复制代码
  然后,下载依赖库。
  1. ~ mvn clean install
复制代码
  由于 org.conan.mymahout.cluster06.Kmeans.java 类代码,是基于mahout-0.6的,所以会报错。我们可以先注释这个文件。
2. Mahout基于Hadoop的分步环境介绍
hadoop-mahout-cluster-dev.png
  如上图所示,我们可以选择在win7中开发,也可以在linux中开发,开发过程我们可以在本地环境进行调试,标配的工具都是Maven和Eclipse。
  Mahout在运行过程中,会把MapReduce的算法程序包,自动发布的Hadoop的集群环境中,这种开发和运行模式,就和真正的生产环境差不多了。
3. 用Mahout实现协同过滤ItemCF
  实现步骤:
  • 准备数据文件: item.csv
  • Java程序:HdfsDAO.java
  • Java程序:ItemCFHadoop.java
  • 运行程序
  • 推荐结果解读
1). 准备数据文件: item.csv
  上传测试数据到HDFS,单机内存实验请参考文章:用Maven构建Mahout项目
  1. ~ hadoop fs -mkdir /user/hdfs/userCF
  2. ~ hadoop fs -copyFromLocal /home/conan/datafiles/item.csv /user/hdfs/userCF
  3. ~ hadoop fs -cat /user/hdfs/userCF/item.csv
  4. 1,101,5.0
  5. 1,102,3.0
  6. 1,103,2.5
  7. 2,101,2.0
  8. 2,102,2.5
  9. 2,103,5.0
  10. 2,104,2.0
  11. 3,101,2.5
  12. 3,104,4.0
  13. 3,105,4.5
  14. 3,107,5.0
  15. 4,101,5.0
  16. 4,103,3.0
  17. 4,104,4.5
  18. 4,106,4.0
  19. 5,101,4.0
  20. 5,102,3.0
  21. 5,103,2.0
  22. 5,104,4.0
  23. 5,105,3.5
  24. 5,106,4.0
复制代码
2). Java程序:HdfsDAO.java
  HdfsDAO.java,是一个HDFS操作的工具,用API实现Hadoop的各种HDFS命令。
  我们这里会用到HdfsDAO.java类中的一些方法:
  1.         HdfsDAO hdfs = new HdfsDAO(HDFS, conf);
  2.         hdfs.rmr(inPath);
  3.         hdfs.mkdirs(inPath);
  4.         hdfs.copyFile(localFile, inPath);
  5.         hdfs.ls(inPath);
  6.         hdfs.cat(inFile);
复制代码
3). Java程序:ItemCFHadoop.java
  用Mahout实现分步式算法,我们看到Mahout in Action中的解释。
aglorithm_2.jpg

  实现程序:
  1. package org.conan.mymahout.recommendation;
  2. import org.apache.hadoop.mapred.JobConf;
  3. import org.apache.mahout.cf.taste.hadoop.item.RecommenderJob;
  4. import org.conan.mymahout.hdfs.HdfsDAO;
  5. public class ItemCFHadoop {
  6.     private static final String HDFS = "hdfs://192.168.1.210:9000";
  7.     public static void main(String[] args) throws Exception {
  8.         String localFile = "datafile/item.csv";
  9.         String inPath = HDFS + "/user/hdfs/userCF";
  10.         String inFile = inPath + "/item.csv";
  11.         String outPath = HDFS + "/user/hdfs/userCF/result/";
  12.         String outFile = outPath + "/part-r-00000";
  13.         String tmpPath = HDFS + "/tmp/" + System.currentTimeMillis();
  14.         JobConf conf = config();
  15.         HdfsDAO hdfs = new HdfsDAO(HDFS, conf);
  16.         hdfs.rmr(inPath);
  17.         hdfs.mkdirs(inPath);
  18.         hdfs.copyFile(localFile, inPath);
  19.         hdfs.ls(inPath);
  20.         hdfs.cat(inFile);
  21.         StringBuilder sb = new StringBuilder();
  22.         sb.append("--input ").append(inPath);
  23.         sb.append(" --output ").append(outPath);
  24.         sb.append(" --booleanData true");
  25.         sb.append(" --similarityClassname org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.EuclideanDistanceSimilarity");
  26.         sb.append(" --tempDir ").append(tmpPath);
  27.         args = sb.toString().split(" ");
  28.         RecommenderJob job = new RecommenderJob();
  29.         job.setConf(conf);
  30.         job.run(args);
  31.         hdfs.cat(outFile);
  32.     }
  33.     public static JobConf config() {
  34.         JobConf conf = new JobConf(ItemCFHadoop.class);
  35.         conf.setJobName("ItemCFHadoop");
  36.         conf.addResource("classpath:/hadoop/core-site.xml");
  37.         conf.addResource("classpath:/hadoop/hdfs-site.xml");
  38.         conf.addResource("classpath:/hadoop/mapred-site.xml");
  39.         return conf;
  40.     }
  41. }
复制代码
  RecommenderJob.java,实际上就是封装了上面整个图的分步式并行算法的执行过程!如果没有这层封装,我们需要自己去实现图中8个步骤MapReduce算法。
4). 运行程序
  控制台输出:
  1. Delete: hdfs://192.168.1.210:9000/user/hdfs/userCF
  2. Create: hdfs://192.168.1.210:9000/user/hdfs/userCF
  3. copy from: datafile/item.csv to hdfs://192.168.1.210:9000/user/hdfs/userCF
  4. ls: hdfs://192.168.1.210:9000/user/hdfs/userCF
  5. ==========================================================
  6. name: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv, folder: false, size: 229
  7. ==========================================================
  8. cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv
  9. 1,101,5.0
  10. 1,102,3.0
  11. 1,103,2.5
  12. 2,101,2.0
  13. 2,102,2.5
  14. 2,103,5.0
  15. 2,104,2.0
  16. 3,101,2.5
  17. 3,104,4.0
  18. 3,105,4.5
  19. 3,107,5.0
  20. 4,101,5.0
  21. 4,103,3.0
  22. 4,104,4.5
  23. 4,106,4.0
  24. 5,101,4.0
  25. 5,102,3.0
  26. 5,103,2.0
  27. 5,104,4.0
  28. 5,105,3.5
  29. 5,106,4.0SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
  30. SLF4J: Defaulting to no-operation (NOP) logger implementation
  31. SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
  32. 2013-10-14 10:26:35 org.apache.hadoop.util.NativeCodeLoader
  33. 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  34. 2013-10-14 10:26:35 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  35. 信息: Total input paths to process : 1
  36. 2013-10-14 10:26:35 org.apache.hadoop.io.compress.snappy.LoadSnappy
  37. 警告: Snappy native library not loaded
  38. 2013-10-14 10:26:36 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  39. 信息: Running job: job_local_0001
  40. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task initialize
  41. 信息:  Using ResourceCalculatorPlugin : null
  42. 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  43. 信息: io.sort.mb = 100
  44. 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  45. 信息: data buffer = 79691776/99614720
  46. 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  47. 信息: record buffer = 262144/327680
  48. 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  49. 信息: Starting flush of map output
  50. 2013-10-14 10:26:36 org.apache.hadoop.io.compress.CodecPool getCompressor
  51. 信息: Got brand-new compressor
  52. 2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  53. 信息: Finished spill 0
  54. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task done
  55. 信息: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
  56. 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  57. 信息:
  58. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task sendDone
  59. 信息: Task 'attempt_local_0001_m_000000_0' done.
  60. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task initialize
  61. 信息:  Using ResourceCalculatorPlugin : null
  62. 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  63. 信息:
  64. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Merger$MergeQueue merge
  65. 信息: Merging 1 sorted segments
  66. 2013-10-14 10:26:36 org.apache.hadoop.io.compress.CodecPool getDecompressor
  67. 信息: Got brand-new decompressor
  68. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Merger$MergeQueue merge
  69. 信息: Down to the last merge-pass, with 1 segments left of total size: 42 bytes
  70. 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  71. 信息:
  72. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task done
  73. 信息: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
  74. 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  75. 信息:
  76. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task commit
  77. 信息: Task attempt_local_0001_r_000000_0 is allowed to commit now
  78. 2013-10-14 10:26:36 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask
  79. 信息: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/itemIDIndex
  80. 2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  81. 信息: reduce > reduce
  82. 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task sendDone
  83. 信息: Task 'attempt_local_0001_r_000000_0' done.
  84. 2013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  85. 信息:  map 100% reduce 100%
  86. 2013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  87. 信息: Job complete: job_local_0001
  88. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  89. 信息: Counters: 19
  90. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  91. 信息:   File Output Format Counters
  92. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  93. 信息:     Bytes Written=187
  94. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  95. 信息:   FileSystemCounters
  96. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  97. 信息:     FILE_BYTES_READ=3287330
  98. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  99. 信息:     HDFS_BYTES_READ=916
  100. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  101. 信息:     FILE_BYTES_WRITTEN=3443292
  102. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  103. 信息:     HDFS_BYTES_WRITTEN=645
  104. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  105. 信息:   File Input Format Counters
  106. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  107. 信息:     Bytes Read=229
  108. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  109. 信息:   Map-Reduce Framework
  110. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  111. 信息:     Map output materialized bytes=46
  112. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  113. 信息:     Map input records=21
  114. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  115. 信息:     Reduce shuffle bytes=0
  116. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  117. 信息:     Spilled Records=14
  118. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  119. 信息:     Map output bytes=84
  120. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  121. 信息:     Total committed heap usage (bytes)=376569856
  122. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  123. 信息:     SPLIT_RAW_BYTES=116
  124. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  125. 信息:     Combine input records=21
  126. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  127. 信息:     Reduce input records=7
  128. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  129. 信息:     Reduce input groups=7
  130. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  131. 信息:     Combine output records=7
  132. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  133. 信息:     Reduce output records=7
  134. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log
  135. 信息:     Map output records=21
  136. 2013-10-14 10:26:37 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  137. 信息: Total input paths to process : 1
  138. 2013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  139. 信息: Running job: job_local_0002
  140. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task initialize
  141. 信息:  Using ResourceCalculatorPlugin : null
  142. 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  143. 信息: io.sort.mb = 100
  144. 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  145. 信息: data buffer = 79691776/99614720
  146. 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  147. 信息: record buffer = 262144/327680
  148. 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  149. 信息: Starting flush of map output
  150. 2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  151. 信息: Finished spill 0
  152. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task done
  153. 信息: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
  154. 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  155. 信息:
  156. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task sendDone
  157. 信息: Task 'attempt_local_0002_m_000000_0' done.
  158. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task initialize
  159. 信息:  Using ResourceCalculatorPlugin : null
  160. 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  161. 信息:
  162. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Merger$MergeQueue merge
  163. 信息: Merging 1 sorted segments
  164. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Merger$MergeQueue merge
  165. 信息: Down to the last merge-pass, with 1 segments left of total size: 68 bytes
  166. 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  167. 信息:
  168. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task done
  169. 信息: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
  170. 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  171. 信息:
  172. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task commit
  173. 信息: Task attempt_local_0002_r_000000_0 is allowed to commit now
  174. 2013-10-14 10:26:37 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask
  175. 信息: Saved output of task 'attempt_local_0002_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/userVectors
  176. 2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  177. 信息: reduce > reduce
  178. 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task sendDone
  179. 信息: Task 'attempt_local_0002_r_000000_0' done.
  180. 2013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  181. 信息:  map 100% reduce 100%
  182. 2013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  183. 信息: Job complete: job_local_0002
  184. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  185. 信息: Counters: 20
  186. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  187. 信息:   org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
  188. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  189. 信息:     USERS=5
  190. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  191. 信息:   File Output Format Counters
  192. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  193. 信息:     Bytes Written=288
  194. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  195. 信息:   FileSystemCounters
  196. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  197. 信息:     FILE_BYTES_READ=6574274
  198. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  199. 信息:     HDFS_BYTES_READ=1374
  200. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  201. 信息:     FILE_BYTES_WRITTEN=6887592
  202. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  203. 信息:     HDFS_BYTES_WRITTEN=1120
  204. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  205. 信息:   File Input Format Counters
  206. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  207. 信息:     Bytes Read=229
  208. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  209. 信息:   Map-Reduce Framework
  210. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  211. 信息:     Map output materialized bytes=72
  212. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  213. 信息:     Map input records=21
  214. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  215. 信息:     Reduce shuffle bytes=0
  216. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  217. 信息:     Spilled Records=42
  218. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  219. 信息:     Map output bytes=63
  220. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  221. 信息:     Total committed heap usage (bytes)=575930368
  222. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  223. 信息:     SPLIT_RAW_BYTES=116
  224. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  225. 信息:     Combine input records=0
  226. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  227. 信息:     Reduce input records=21
  228. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  229. 信息:     Reduce input groups=5
  230. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  231. 信息:     Combine output records=0
  232. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  233. 信息:     Reduce output records=5
  234. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log
  235. 信息:     Map output records=21
  236. 2013-10-14 10:26:38 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  237. 信息: Total input paths to process : 1
  238. 2013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  239. 信息: Running job: job_local_0003
  240. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task initialize
  241. 信息:  Using ResourceCalculatorPlugin : null
  242. 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  243. 信息: io.sort.mb = 100
  244. 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  245. 信息: data buffer = 79691776/99614720
  246. 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  247. 信息: record buffer = 262144/327680
  248. 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  249. 信息: Starting flush of map output
  250. 2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  251. 信息: Finished spill 0
  252. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task done
  253. 信息: Task:attempt_local_0003_m_000000_0 is done. And is in the process of commiting
  254. 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  255. 信息:
  256. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task sendDone
  257. 信息: Task 'attempt_local_0003_m_000000_0' done.
  258. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task initialize
  259. 信息:  Using ResourceCalculatorPlugin : null
  260. 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  261. 信息:
  262. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Merger$MergeQueue merge
  263. 信息: Merging 1 sorted segments
  264. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Merger$MergeQueue merge
  265. 信息: Down to the last merge-pass, with 1 segments left of total size: 89 bytes
  266. 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  267. 信息:
  268. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task done
  269. 信息: Task:attempt_local_0003_r_000000_0 is done. And is in the process of commiting
  270. 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  271. 信息:
  272. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task commit
  273. 信息: Task attempt_local_0003_r_000000_0 is allowed to commit now
  274. 2013-10-14 10:26:38 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask
  275. 信息: Saved output of task 'attempt_local_0003_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/ratingMatrix
  276. 2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  277. 信息: reduce > reduce
  278. 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task sendDone
  279. 信息: Task 'attempt_local_0003_r_000000_0' done.
  280. 2013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  281. 信息:  map 100% reduce 100%
  282. 2013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  283. 信息: Job complete: job_local_0003
  284. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  285. 信息: Counters: 21
  286. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  287. 信息:   File Output Format Counters
  288. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  289. 信息:     Bytes Written=335
  290. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  291. 信息:   org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
  292. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  293. 信息:     USER_RATINGS_NEGLECTED=0
  294. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  295. 信息:     USER_RATINGS_USED=21
  296. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  297. 信息:   FileSystemCounters
  298. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  299. 信息:     FILE_BYTES_READ=9861349
  300. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  301. 信息:     HDFS_BYTES_READ=1950
  302. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  303. 信息:     FILE_BYTES_WRITTEN=10331958
  304. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  305. 信息:     HDFS_BYTES_WRITTEN=1751
  306. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  307. 信息:   File Input Format Counters
  308. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  309. 信息:     Bytes Read=288
  310. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  311. 信息:   Map-Reduce Framework
  312. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  313. 信息:     Map output materialized bytes=93
  314. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  315. 信息:     Map input records=5
  316. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  317. 信息:     Reduce shuffle bytes=0
  318. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  319. 信息:     Spilled Records=14
  320. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  321. 信息:     Map output bytes=336
  322. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  323. 信息:     Total committed heap usage (bytes)=775290880
  324. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  325. 信息:     SPLIT_RAW_BYTES=157
  326. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  327. 信息:     Combine input records=21
  328. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  329. 信息:     Reduce input records=7
  330. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  331. 信息:     Reduce input groups=7
  332. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  333. 信息:     Combine output records=7
  334. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  335. 信息:     Reduce output records=7
  336. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log
  337. 信息:     Map output records=21
  338. 2013-10-14 10:26:39 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  339. 信息: Total input paths to process : 1
  340. 2013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  341. 信息: Running job: job_local_0004
  342. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task initialize
  343. 信息:  Using ResourceCalculatorPlugin : null
  344. 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  345. 信息: io.sort.mb = 100
  346. 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  347. 信息: data buffer = 79691776/99614720
  348. 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  349. 信息: record buffer = 262144/327680
  350. 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  351. 信息: Starting flush of map output
  352. 2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  353. 信息: Finished spill 0
  354. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task done
  355. 信息: Task:attempt_local_0004_m_000000_0 is done. And is in the process of commiting
  356. 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  357. 信息:
  358. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task sendDone
  359. 信息: Task 'attempt_local_0004_m_000000_0' done.
  360. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task initialize
  361. 信息:  Using ResourceCalculatorPlugin : null
  362. 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  363. 信息:
  364. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Merger$MergeQueue merge
  365. 信息: Merging 1 sorted segments
  366. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Merger$MergeQueue merge
  367. 信息: Down to the last merge-pass, with 1 segments left of total size: 118 bytes
  368. 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  369. 信息:
  370. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task done
  371. 信息: Task:attempt_local_0004_r_000000_0 is done. And is in the process of commiting
  372. 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  373. 信息:
  374. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task commit
  375. 信息: Task attempt_local_0004_r_000000_0 is allowed to commit now
  376. 2013-10-14 10:26:39 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask
  377. 信息: Saved output of task 'attempt_local_0004_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/weights
  378. 2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  379. 信息: reduce > reduce
  380. 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task sendDone
  381. 信息: Task 'attempt_local_0004_r_000000_0' done.
  382. 2013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  383. 信息:  map 100% reduce 100%
  384. 2013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  385. 信息: Job complete: job_local_0004
  386. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  387. 信息: Counters: 20
  388. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  389. 信息:   File Output Format Counters
  390. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  391. 信息:     Bytes Written=381
  392. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  393. 信息:   FileSystemCounters
  394. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  395. 信息:     FILE_BYTES_READ=13148476
  396. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  397. 信息:     HDFS_BYTES_READ=2628
  398. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  399. 信息:     FILE_BYTES_WRITTEN=13780408
  400. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  401. 信息:     HDFS_BYTES_WRITTEN=2551
  402. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  403. 信息:   File Input Format Counters
  404. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  405. 信息:     Bytes Read=335
  406. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  407. 信息:   org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
  408. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  409. 信息:     ROWS=7
  410. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  411. 信息:   Map-Reduce Framework
  412. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  413. 信息:     Map output materialized bytes=122
  414. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  415. 信息:     Map input records=7
  416. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  417. 信息:     Reduce shuffle bytes=0
  418. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  419. 信息:     Spilled Records=16
  420. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  421. 信息:     Map output bytes=516
  422. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  423. 信息:     Total committed heap usage (bytes)=974651392
  424. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  425. 信息:     SPLIT_RAW_BYTES=158
  426. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  427. 信息:     Combine input records=24
  428. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  429. 信息:     Reduce input records=8
  430. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  431. 信息:     Reduce input groups=8
  432. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  433. 信息:     Combine output records=8
  434. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  435. 信息:     Reduce output records=5
  436. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log
  437. 信息:     Map output records=24
  438. 2013-10-14 10:26:40 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  439. 信息: Total input paths to process : 1
  440. 2013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  441. 信息: Running job: job_local_0005
  442. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task initialize
  443. 信息:  Using ResourceCalculatorPlugin : null
  444. 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  445. 信息: io.sort.mb = 100
  446. 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  447. 信息: data buffer = 79691776/99614720
  448. 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  449. 信息: record buffer = 262144/327680
  450. 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  451. 信息: Starting flush of map output
  452. 2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  453. 信息: Finished spill 0
  454. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task done
  455. 信息: Task:attempt_local_0005_m_000000_0 is done. And is in the process of commiting
  456. 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  457. 信息:
  458. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task sendDone
  459. 信息: Task 'attempt_local_0005_m_000000_0' done.
  460. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task initialize
  461. 信息:  Using ResourceCalculatorPlugin : null
  462. 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  463. 信息:
  464. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Merger$MergeQueue merge
  465. 信息: Merging 1 sorted segments
  466. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Merger$MergeQueue merge
  467. 信息: Down to the last merge-pass, with 1 segments left of total size: 121 bytes
  468. 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  469. 信息:
  470. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task done
  471. 信息: Task:attempt_local_0005_r_000000_0 is done. And is in the process of commiting
  472. 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  473. 信息:
  474. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task commit
  475. 信息: Task attempt_local_0005_r_000000_0 is allowed to commit now
  476. 2013-10-14 10:26:40 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask
  477. 信息: Saved output of task 'attempt_local_0005_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/pairwiseSimilarity
  478. 2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  479. 信息: reduce > reduce
  480. 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task sendDone
  481. 信息: Task 'attempt_local_0005_r_000000_0' done.
  482. 2013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  483. 信息:  map 100% reduce 100%
  484. 2013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  485. 信息: Job complete: job_local_0005
  486. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  487. 信息: Counters: 21
  488. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  489. 信息:   File Output Format Counters
  490. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  491. 信息:     Bytes Written=392
  492. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  493. 信息:   FileSystemCounters
  494. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  495. 信息:     FILE_BYTES_READ=16435577
  496. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  497. 信息:     HDFS_BYTES_READ=3488
  498. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  499. 信息:     FILE_BYTES_WRITTEN=17230010
  500. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  501. 信息:     HDFS_BYTES_WRITTEN=3408
  502. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  503. 信息:   File Input Format Counters
  504. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  505. 信息:     Bytes Read=381
  506. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  507. 信息:   org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
  508. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  509. 信息:     PRUNED_COOCCURRENCES=0
  510. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  511. 信息:     COOCCURRENCES=57
  512. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  513. 信息:   Map-Reduce Framework
  514. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  515. 信息:     Map output materialized bytes=125
  516. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  517. 信息:     Map input records=5
  518. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  519. 信息:     Reduce shuffle bytes=0
  520. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  521. 信息:     Spilled Records=14
  522. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  523. 信息:     Map output bytes=744
  524. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  525. 信息:     Total committed heap usage (bytes)=1174011904
  526. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  527. 信息:     SPLIT_RAW_BYTES=129
  528. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  529. 信息:     Combine input records=21
  530. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  531. 信息:     Reduce input records=7
  532. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  533. 信息:     Reduce input groups=7
  534. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  535. 信息:     Combine output records=7
  536. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  537. 信息:     Reduce output records=7
  538. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log
  539. 信息:     Map output records=21
  540. 2013-10-14 10:26:41 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  541. 信息: Total input paths to process : 1
  542. 2013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  543. 信息: Running job: job_local_0006
  544. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task initialize
  545. 信息:  Using ResourceCalculatorPlugin : null
  546. 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  547. 信息: io.sort.mb = 100
  548. 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  549. 信息: data buffer = 79691776/99614720
  550. 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  551. 信息: record buffer = 262144/327680
  552. 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  553. 信息: Starting flush of map output
  554. 2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  555. 信息: Finished spill 0
  556. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task done
  557. 信息: Task:attempt_local_0006_m_000000_0 is done. And is in the process of commiting
  558. 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  559. 信息:
  560. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task sendDone
  561. 信息: Task 'attempt_local_0006_m_000000_0' done.
  562. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task initialize
  563. 信息:  Using ResourceCalculatorPlugin : null
  564. 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  565. 信息:
  566. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Merger$MergeQueue merge
  567. 信息: Merging 1 sorted segments
  568. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Merger$MergeQueue merge
  569. 信息: Down to the last merge-pass, with 1 segments left of total size: 158 bytes
  570. 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  571. 信息:
  572. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task done
  573. 信息: Task:attempt_local_0006_r_000000_0 is done. And is in the process of commiting
  574. 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  575. 信息:
  576. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task commit
  577. 信息: Task attempt_local_0006_r_000000_0 is allowed to commit now
  578. 2013-10-14 10:26:41 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask
  579. 信息: Saved output of task 'attempt_local_0006_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/similarityMatrix
  580. 2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  581. 信息: reduce > reduce
  582. 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task sendDone
  583. 信息: Task 'attempt_local_0006_r_000000_0' done.
  584. 2013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  585. 信息:  map 100% reduce 100%
  586. 2013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  587. 信息: Job complete: job_local_0006
  588. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  589. 信息: Counters: 19
  590. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  591. 信息:   File Output Format Counters
  592. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  593. 信息:     Bytes Written=554
  594. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  595. 信息:   FileSystemCounters
  596. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  597. 信息:     FILE_BYTES_READ=19722740
  598. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  599. 信息:     HDFS_BYTES_READ=4342
  600. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  601. 信息:     FILE_BYTES_WRITTEN=20674772
  602. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  603. 信息:     HDFS_BYTES_WRITTEN=4354
  604. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  605. 信息:   File Input Format Counters
  606. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  607. 信息:     Bytes Read=392
  608. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  609. 信息:   Map-Reduce Framework
  610. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  611. 信息:     Map output materialized bytes=162
  612. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  613. 信息:     Map input records=7
  614. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  615. 信息:     Reduce shuffle bytes=0
  616. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  617. 信息:     Spilled Records=14
  618. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  619. 信息:     Map output bytes=599
  620. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  621. 信息:     Total committed heap usage (bytes)=1373372416
  622. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  623. 信息:     SPLIT_RAW_BYTES=140
  624. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  625. 信息:     Combine input records=25
  626. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  627. 信息:     Reduce input records=7
  628. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  629. 信息:     Reduce input groups=7
  630. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  631. 信息:     Combine output records=7
  632. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  633. 信息:     Reduce output records=7
  634. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log
  635. 信息:     Map output records=25
  636. 2013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  637. 信息: Total input paths to process : 1
  638. 2013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  639. 信息: Total input paths to process : 1
  640. 2013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  641. 信息: Running job: job_local_0007
  642. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize
  643. 信息:  Using ResourceCalculatorPlugin : null
  644. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  645. 信息: io.sort.mb = 100
  646. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  647. 信息: data buffer = 79691776/99614720
  648. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  649. 信息: record buffer = 262144/327680
  650. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  651. 信息: Starting flush of map output
  652. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  653. 信息: Finished spill 0
  654. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task done
  655. 信息: Task:attempt_local_0007_m_000000_0 is done. And is in the process of commiting
  656. 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  657. 信息:
  658. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone
  659. 信息: Task 'attempt_local_0007_m_000000_0' done.
  660. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize
  661. 信息:  Using ResourceCalculatorPlugin : null
  662. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  663. 信息: io.sort.mb = 100
  664. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  665. 信息: data buffer = 79691776/99614720
  666. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  667. 信息: record buffer = 262144/327680
  668. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  669. 信息: Starting flush of map output
  670. 2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  671. 信息: Finished spill 0
  672. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task done
  673. 信息: Task:attempt_local_0007_m_000001_0 is done. And is in the process of commiting
  674. 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  675. 信息:
  676. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone
  677. 信息: Task 'attempt_local_0007_m_000001_0' done.
  678. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize
  679. 信息:  Using ResourceCalculatorPlugin : null
  680. 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  681. 信息:
  682. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Merger$MergeQueue merge
  683. 信息: Merging 2 sorted segments
  684. 2013-10-14 10:26:42 org.apache.hadoop.io.compress.CodecPool getDecompressor
  685. 信息: Got brand-new decompressor
  686. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Merger$MergeQueue merge
  687. 信息: Down to the last merge-pass, with 2 segments left of total size: 233 bytes
  688. 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  689. 信息:
  690. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task done
  691. 信息: Task:attempt_local_0007_r_000000_0 is done. And is in the process of commiting
  692. 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  693. 信息:
  694. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task commit
  695. 信息: Task attempt_local_0007_r_000000_0 is allowed to commit now
  696. 2013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask
  697. 信息: Saved output of task 'attempt_local_0007_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/partialMultiply
  698. 2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  699. 信息: reduce > reduce
  700. 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone
  701. 信息: Task 'attempt_local_0007_r_000000_0' done.
  702. 2013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  703. 信息:  map 100% reduce 100%
  704. 2013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  705. 信息: Job complete: job_local_0007
  706. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  707. 信息: Counters: 19
  708. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  709. 信息:   File Output Format Counters
  710. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  711. 信息:     Bytes Written=572
  712. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  713. 信息:   FileSystemCounters
  714. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  715. 信息:     FILE_BYTES_READ=34517913
  716. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  717. 信息:     HDFS_BYTES_READ=8751
  718. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  719. 信息:     FILE_BYTES_WRITTEN=36182630
  720. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  721. 信息:     HDFS_BYTES_WRITTEN=7934
  722. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  723. 信息:   File Input Format Counters
  724. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  725. 信息:     Bytes Read=0
  726. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  727. 信息:   Map-Reduce Framework
  728. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  729. 信息:     Map output materialized bytes=241
  730. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  731. 信息:     Map input records=12
  732. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  733. 信息:     Reduce shuffle bytes=0
  734. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  735. 信息:     Spilled Records=56
  736. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  737. 信息:     Map output bytes=453
  738. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  739. 信息:     Total committed heap usage (bytes)=2558459904
  740. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  741. 信息:     SPLIT_RAW_BYTES=665
  742. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  743. 信息:     Combine input records=0
  744. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  745. 信息:     Reduce input records=28
  746. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  747. 信息:     Reduce input groups=7
  748. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  749. 信息:     Combine output records=0
  750. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  751. 信息:     Reduce output records=7
  752. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log
  753. 信息:     Map output records=28
  754. 2013-10-14 10:26:43 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
  755. 信息: Total input paths to process : 1
  756. 2013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  757. 信息: Running job: job_local_0008
  758. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task initialize
  759. 信息:  Using ResourceCalculatorPlugin : null
  760. 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  761. 信息: io.sort.mb = 100
  762. 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  763. 信息: data buffer = 79691776/99614720
  764. 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
  765. 信息: record buffer = 262144/327680
  766. 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
  767. 信息: Starting flush of map output
  768. 2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
  769. 信息: Finished spill 0
  770. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task done
  771. 信息: Task:attempt_local_0008_m_000000_0 is done. And is in the process of commiting
  772. 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  773. 信息:
  774. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task sendDone
  775. 信息: Task 'attempt_local_0008_m_000000_0' done.
  776. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task initialize
  777. 信息:  Using ResourceCalculatorPlugin : null
  778. 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  779. 信息:
  780. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Merger$MergeQueue merge
  781. 信息: Merging 1 sorted segments
  782. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Merger$MergeQueue merge
  783. 信息: Down to the last merge-pass, with 1 segments left of total size: 206 bytes
  784. 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  785. 信息:
  786. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task done
  787. 信息: Task:attempt_local_0008_r_000000_0 is done. And is in the process of commiting
  788. 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  789. 信息:
  790. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task commit
  791. 信息: Task attempt_local_0008_r_000000_0 is allowed to commit now
  792. 2013-10-14 10:26:43 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask
  793. 信息: Saved output of task 'attempt_local_0008_r_000000_0' to hdfs://192.168.1.210:9000/user/hdfs/userCF/result
  794. 2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
  795. 信息: reduce > reduce
  796. 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task sendDone
  797. 信息: Task 'attempt_local_0008_r_000000_0' done.
  798. 2013-10-14 10:26:44 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  799. 信息:  map 100% reduce 100%
  800. 2013-10-14 10:26:44 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
  801. 信息: Job complete: job_local_0008
  802. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  803. 信息: Counters: 19
  804. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  805. 信息:   File Output Format Counters
  806. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  807. 信息:     Bytes Written=217
  808. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  809. 信息:   FileSystemCounters
  810. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  811. 信息:     FILE_BYTES_READ=26299802
  812. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  813. 信息:     HDFS_BYTES_READ=7357
  814. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  815. 信息:     FILE_BYTES_WRITTEN=27566408
  816. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  817. 信息:     HDFS_BYTES_WRITTEN=6269
  818. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  819. 信息:   File Input Format Counters
  820. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  821. 信息:     Bytes Read=572
  822. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  823. 信息:   Map-Reduce Framework
  824. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  825. 信息:     Map output materialized bytes=210
  826. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  827. 信息:     Map input records=7
  828. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  829. 信息:     Reduce shuffle bytes=0
  830. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  831. 信息:     Spilled Records=42
  832. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  833. 信息:     Map output bytes=927
  834. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  835. 信息:     Total committed heap usage (bytes)=1971453952
  836. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  837. 信息:     SPLIT_RAW_BYTES=137
  838. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  839. 信息:     Combine input records=0
  840. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  841. 信息:     Reduce input records=21
  842. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  843. 信息:     Reduce input groups=5
  844. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  845. 信息:     Combine output records=0
  846. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  847. 信息:     Reduce output records=5
  848. 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log
  849. 信息:     Map output records=21
  850. cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/result//part-r-00000
  851. 1        [104:1.280239,106:1.1462644,105:1.0653841,107:0.33333334]
  852. 2        [106:1.560478,105:1.4795978,107:0.69935876]
  853. 3        [103:1.2475469,106:1.1944525,102:1.1462644]
  854. 4        [102:1.6462644,105:1.5277859,107:0.69935876]
  855. 5        [107:1.1993587]
复制代码
5). 推荐结果解读
  我们可以把上面的日志分解析成3个部分解读
  • 初始化环境
  • 算法执行
  • 打印推荐结果
a.初始化环境
  初始HDFS的数据目录和工作目录,并上传数据文件。
  1. Delete: hdfs://192.168.1.210:9000/user/hdfs/userCF
  2. Create: hdfs://192.168.1.210:9000/user/hdfs/userCF
  3. copy from: datafile/item.csv to hdfs://192.168.1.210:9000/user/hdfs/userCF
  4. ls: hdfs://192.168.1.210:9000/user/hdfs/userCF
  5. ==========================================================
  6. name: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv, folder: false, size: 229
  7. ==========================================================
  8. cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.cs
复制代码
b. 算法执行
  分别执行,上图中对应的8种MapReduce算法。
Job complete: job_local_0001
Job complete: job_local_0002
Job complete: job_local_0003
Job complete: job_local_0004
Job complete: job_local_0005
Job complete: job_local_0006
Job complete: job_local_0007
Job complete: job_local_0008
c. 打印推荐结果
  方便我们看到计算后的推荐结果
  1. <font face="Tahoma">cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/result//part-r-00000
  2. 1        [104:1.280239,106:1.1462644,105:1.0653841,107:0.33333334]
  3. 2        [106:1.560478,105:1.4795978,107:0.69935876]
  4. 3        [103:1.2475469,106:1.1944525,102:1.1462644]
  5. 4        [102:1.6462644,105:1.5277859,107:0.69935876]
  6. 5        [107:1.1993587</font>
复制代码






已有(6)人评论

跳转到指定楼层
pengsuyun 发表于 2014-10-27 07:42:55
很好,谢谢版主!
回复

使用道具 举报

韩克拉玛寒 发表于 2014-10-27 10:47:23
很不错,已分享
回复

使用道具 举报

hb1984 发表于 2014-10-27 15:00:44
谢谢楼主分享。        
回复

使用道具 举报

蓝骑士 发表于 2014-10-29 23:36:22
非常叼,赞一个
回复

使用道具 举报

anyhuayong 发表于 2014-10-30 08:32:22
好文章,楼主辛苦
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条