hadoop2.x如何通过eclipse执行mapreduce

请问有谁用过eclipse执行hadoop2.2的MapReduce吗？有的话能提供个例子吗？不要用hadoop jar执行

sstutu · 发表于 2014-4-29 17:05:52

在hadoop-2.2.0.tar.gz文件下没有找到源码（新版本不但没有Eclipse插件，也没有源码，只有.class字节码文件），可以下载hadoop-2.2.0-src.tar.gz，解压，然后在hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples目录下获取源码。

/** 
 * Licensed to the Apache Software Foundation (ASF) under one 
 * or more contributor license agreements.  See the NOTICE file 
 * distributed with this work for additional information 
 * regarding copyright ownership.  The ASF licenses this file 
 * to you under the Apache License, Version 2.0 (the 
 * "License"); you may not use this file except in compliance 
 * with the License.  You may obtain a copy of the License at 
 * 
 *     http://www.apache.org/licenses/LICENSE-2.0 
 * 
 * Unless required by applicable law or agreed to in writing, software 
 * distributed under the License is distributed on an "AS IS" BASIS, 
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
 * See the License for the specific language governing permissions and 
 * limitations under the License. 
 */  
package org.apache.hadoop.examples;  
  
import java.io.IOException;  
import java.util.StringTokenizer;  
  
import org.apache.hadoop.conf.Configuration;  
import org.apache.hadoop.fs.Path;  
import org.apache.hadoop.io.IntWritable;  
import org.apache.hadoop.io.Text;  
import org.apache.hadoop.mapreduce.Job;  
import org.apache.hadoop.mapreduce.Mapper;  
import org.apache.hadoop.mapreduce.Reducer;  
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
import org.apache.hadoop.util.GenericOptionsParser;  
  
public class WordCount {  
  
  public static class TokenizerMapper   
       extends Mapper<Object, Text, Text, IntWritable>{  
      
    private final static IntWritable one = new IntWritable(1);  
    private Text word = new Text();  
    // value已经是文件内容的一行  
    public void map(Object key, Text value, Context context  
                    ) throws IOException, InterruptedException {  
      StringTokenizer itr = new StringTokenizer(value.toString());  
      while (itr.hasMoreTokens()) {  
        word.set(itr.nextToken());  
        context.write(word, one);  
      }  
    }  
  }  
    
  public static class IntSumReducer   
       extends Reducer<Text,IntWritable,Text,IntWritable> {  
    private IntWritable result = new IntWritable();  
  
    public void reduce(Text key, Iterable<IntWritable> values,   
                       Context context  
                       ) throws IOException, InterruptedException {  
      int sum = 0;  
      for (IntWritable val : values) {  
        sum += val.get();  
      }  
      result.set(sum);  
      context.write(key, result);  
    }  
  }  
  
  public static void main(String[] args) throws Exception {  
    Configuration conf = new Configuration();  
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();  
    if (otherArgs.length != 2) {  
      System.err.println("Usage: wordcount <in> <out>");  
      System.exit(2);  
    }  
    Job job = new Job(conf, "word count");  
    job.setJarByClass(WordCount.class);  
    job.setMapperClass(TokenizerMapper.class);  
    job.setCombinerClass(IntSumReducer.class);  
    job.setReducerClass(IntSumReducer.class);  
    job.setOutputKeyClass(Text.class);  
    job.setOutputValueClass(IntWritable.class);  
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));  
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));  
    System.exit(job.waitForCompletion(true) ? 0 : 1);  
  }  
}  
复制代码

在Eclipse中创建一个MapReduce Project，然后新建Java类，例如创建一个MyWordCount 类，然后将WordCount.java程序代码拷贝到MyWordCount.java文件中。然后点击Run-->Run Configurations…，在弹出的对话框中左边栏选择Java Application，选中MyWordCount，在右边栏中对Arguments进行配置。
在Program arguments中配置输入输出目录参数
/home/jack/Desktop/in /home/jack/Desktop/out
在VM arguments中配置VM arguments的参数
-Xms512m -Xmx1024m -XX:MaxPermSize=256m
注：

in文件夹是需要在程序运行前创建的，并且要放入需要统计词频的文件，out文件夹是不能提前创建的，要由系统自动生成，否则运行时会出现Output directory file:/home/jack/Desktop/out already exists错误。
文件输入和输出目录为本地文件系统中的文件。
程序运行需要点击菜单栏上的Run。
程序运行结束后，可以在/home/jack/Desktop/out目录下的part-r-00000文件查看到词频统计的结果。

nettman · 发表于 2014-4-29 20:20:53

你参考下这个吧eclipse中开发Hadoop2.x的Map/Reduce项目汇总

图文精华

hadoop2.x如何通过eclipse执行mapreduce

已有(2)人评论

推荐 /2