How to write a MapReduce job?

Skeleton of MapReduce basic program:

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MyClass /*extends Configured implements Tool*/{
/**
* The map class of MyClass
*/
public static class MyClassMapper
    extends Mapper<Object, Text, Text, IntWritable> {


    public void map(Object key, Text value, Context context) {
        throws IOException, InterruptedException {
             /* your code goes here */

             context.write (your_output_key_type key, your_output_value_type value);
        }
    }
}

/**
* The reduce class of MyClass
*/
public static class MyClassReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context){

               /* your code goes here */
               context.write (your_output_key_type key, your_ouput_value_type value);
    }
}
/**
* The main entry point.
*/
public static void main(String[] args) throws Exception {

      Configuration conf = new Configuration();
      job job = new Job(conf, "skeleton");
      job.setJarByClass(MyClass.class);
      job.setMapperClass(MyClassMapper.class);
      job.setReducerClass(MYClassReducer.class);
      job.setOutputKeyClass(your_output_key_type.class);
      job.setOutputValueClass(your_output_value_type.class);
      FileInputFormat.addInputPath(job, new Path(args[0]));
      FileOutputFormat.setOutputPath(job, new Path(args[1]));

      System.exit(job.waitForCompletion(true) ? 0 : 1);
   }
}

Note: your_output_key_type can and your_output_value_type can be of Data Types Class Text, LongWritable etc etc. provide by hadoop. Mapper's output (key, value) types should be similar to that of Reducer's input (key, value) types. But, it isn't necessary that out Reducer's (key,value) types be similar to Mapper's output (key, value) types.

2 comments:

Anonymous24 June 2012 at 09:55
I think You may Write detail on the Map Reduce. How we can handle from starting to ending with cluster of node and amount of data sharing with example.
Thank You
Digbijayee
Anonymous31 October 2012 at 22:49
It would help better if you can give full code instead of skeleton code

Sunayan Saikia's Blog - Only Coding

Wednesday, 8 June 2011

How to write a MapReduce job?

2 comments:

Jobble

Page Views