教你如何查看API及使用hadoop新api编程：hadoop2.4新api与旧api调用例子对比说明

问题导读：
一直想写hadoop新旧api之间的关系，这对于爱好编程的程序猿来讲，是必备的。
1.hadoop中mapred与mapreduce包，那个是被弃用的？
2.hadoop旧api如何初始化job？
3.hadoop新api使用那个函数来初始化job对象？

程序说明：
下面的mapreduce程序的功能只是计算文件booklist.log的行数，最后输出结果。

   分别调用旧包和新包的方法编写了两分带有main函数的java代码。

   a,新建了mapreduce工程后，先把hadoop的配置目录下的xml都拷贝到src目录下。

   b,在工程src同级目录旁建立conf目录，并放一个log4j.properties文件。

   c, src目录下建立bookCount目录，然后再添加后面的子java文件。

   d, 右击"run as application"或选择hadoop插件菜单"run on hadoop"来触发执行MapReduce程序即可运行。

生成要分析的输入文件
vi booklist.log

添加以下内容即可：

bookname

bookname

bookname

bookname

bookname

bookname

bookname

bookname

bookname

bookname

bookname

bookname

保存退出。

执行的前请通过hdfs的copyFromLocal命令拷贝到hdfs的/user/hduser用户目录下。

老API使用mapred包的代码

文件BookCount.java：

package bookCount;

import java.io.IOException;

import java.util.Iterator;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapred.JobClient;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.Mapper;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reducer;

import org.apache.hadoop.mapred.Reporter;

import org.apache.log4j.Logger;

import org.apache.log4j.PropertyConfigurator;

public class BookCount {

   public static Logger logger = Logger.getLogger(BookCount.class);



   public static void main(String[] args) throws IOException {

            PropertyConfigurator.configure("conf/log4j.properties");

            logger = Logger.getLogger(BookCount.class);

            logger.info("AnaSpeedMr starting");

            System.setProperty("HADOOP_USER_NAME", "hduser");

            JobConf conf = new JobConf(BookCount.class);

            conf.setJobName("bookCount_sample_job");

            FileInputFormat.setInputPaths(conf, new Path("booklist.log"));

            FileOutputFormat.setOutputPath(conf, new Path("booklistResultDir"));

            conf.setMapperClass(BookCountMapper.class);

            conf.setReducerClass(BookCountReducer.class);

            conf.setOutputKeyClass(Text.class);

            conf.setOutputValueClass(IntWritable.class);

            JobClient.runJob(conf);

   }





   static class BookCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {



            @Override

            public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

                  output.collect(new Text("booknum"), new IntWritable(1));

                  logger.info("foxson_mapper_ok");

                  System.out.println("foxsonMapper");

            }

   }

   static class BookCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, LongWritable> {

            @Override

            public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException {

                  long sumBookNum  = 0;

                  while (values.hasNext()) {

                        sumBookNum =sumBookNum+1;

                        values.next();

                  }

                  logger.info("foxson_BookCountReducer_ok");

                  output.collect(key, new LongWritable(sumBookNum));

                  System.out.println("foxsonReduce");

            }

   }

}

新API使用mapreduce包的例子

文件BookCountNew.java：

package bookCount;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;

import org.apache.log4j.Logger;

import org.apache.log4j.PropertyConfigurator;

public class BookCountNew extends Configured implements Tool {

   public static final Logger logger = Logger.getLogger(BookCountNew.class);

   public static void main(String[] args) throws Exception {

            PropertyConfigurator.configure("conf/log4j.properties");

            logger.info("BookCountNew starting");

            System.setProperty("HADOOP_USER_NAME", "hduser");

            Configuration conf = new Configuration();

            int res = ToolRunner.run(conf, new BookCountNew(), args);

            logger.info("BookCountNew end");

            System.exit(res);

   }

   @Override

   public int run(String[] arg0) throws Exception {

            try {

                  Configuration conf = getConf();

                  Job job = Job.getInstance(conf, "bookCount_new_sample_job");

                  job.setJarByClass(getClass());

                  job.setMapperClass(BookCountMapper.class);

                  job.setMapOutputKeyClass(Text.class);

                  job.setMapOutputValueClass(IntWritable.class);

                  job.setReducerClass(BookCountReducer.class);

                  job.setInputFormatClass(TextInputFormat.class);

                  job.setOutputFormatClass(TextOutputFormat.class);

                  TextInputFormat.addInputPath(job, new Path("booklist.log"));

                  TextOutputFormat.setOutputPath(job, new Path("booklistResultDir"));

                  job.setOutputKeyClass(Text.class);

                  job.setOutputValueClass(IntWritable.class);

                  System.exit(job.waitForCompletion(true) ? 0 : 1);

            } catch (Exception e) {

                  logger.error(e.getMessage());

                  e.printStackTrace();

            }

            return 0;

   }

   static class BookCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

            @Override

            public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

                  context.write(new Text("booknum"), new IntWritable(1));

                  logger.info("foxson_mapper_ok");

                  System.out.println("foxsonMapper");

            }

   }

   static class BookCountReducer extends Reducer<Text, IntWritable, Text, LongWritable> {

            @Override

            public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

                  long sumBookNum = 0;

                  for (IntWritable value : values) {

                        sumBookNum = sumBookNum + 1;

                  }

                  logger.info("foxson_BookCountReducer_ok");

                  context.write(key, new LongWritable(sumBookNum));

                  System.out.println("foxsonReduce");

            }

   }

}

上面例子大家可以用来学习，这里在交给大家该如何学习查看api，

咱们还是以上面为例：
1.查看hadoop2.4在线api
首先打开下面链接
http://hadoop.apache.org/docs/r2.4.0/api/index.html

打开之后，我们说一下查看顺序：
如下图所示：
1-->2-->3的顺序
也就是说：如果想了解这个包都包含哪些类接口等需要查看2区域，想看类和接口的详细信息，比如包含哪些函数，函数有什么功能，查看3区域。

api结构.png

2.旧api的各个函数及实例

我们这里以jobconf为例：

从上图查看顺序，我们得到下面代码：

// Create a new JobConf
JobConf job = new JobConf(new Configuration(), MyJob.class);

// Specify various job-specific parameters
job.setJobName("myjob");

FileInputFormat.setInputPaths(job, new Path("in"));
FileOutputFormat.setOutputPath(job, new Path("out"));

job.setMapperClass(MyJob.MyMapper.class);
job.setCombinerClass(MyJob.MyReducer.class);
job.setReducerClass(MyJob.MyReducer.class);

job.setInputFormat(SequenceFileInputFormat.class);
job.setOutputFormat(SequenceFileOutputFormat.class);

3.新api的各个函数及实例

给了这么个例子：

  // Create a new Job
     Job job = new Job(new Configuration());
     job.setJarByClass(MyJob.class);
     
     // Specify various job-specific parameters     
     job.setJobName("myjob");
     
     job.setInputPath(new Path("in"));
     job.setOutputPath(new Path("out"));
     
     job.setMapperClass(MyJob.MyMapper.class);
     job.setReducerClass(MyJob.MyReducer.class);

     // Submit the job, then poll for progress until the job is complete
     job.waitForCompletion(true);
复制代码

上面放到eclipse中，一看不对啊
带个横杠，含义就是被弃用了

下面我们继续寻找：
getInstance() 有很多重载函数，这里不需要解释什么是重载吧，面向对象估计大家学习过，重载就是函数名相同，参数个数和类型可能不同。
好吧，我们试一下这个，如上面新api就是采用这种实例化job的。同时这种实例化的方式采用的是工厂模式，工厂模式，大家也可以找找这方面的资料。

寻找api完毕，更多的函数大家可以在找找。

相关帖子推荐：

Hadoop中mapred包和mapreduce包的区别与联系