Hadoop Customize Input Output Format

To customize inputformat, we need to do following

  • Create customized InputFormat class by extending Hadoop inputformat with your own type of class, such as

    public class LogFileInputFormat extends FileInputFormat<LongWritable, 
    LogWritable>
    
  • In above class, override

    public RecordReader<T,T> createRecordReader(InputSplit arg0, TaskAttemptContext arg1) throws IOException, InterruptedException {...}
    

    which returns customized RecordReader class.

  • In above class, override following one or all if you want to deal with splitable records

    List<InputSplit> getSplits(JobContext job)
    

    which generate the list of files and make them into FileSplits.

    protected boolean isSplitable(JobContext context, Path filename) 
    

    which decides if or when records are splitable.

  • Create customized RecordReader class by extending Hadoop RecordReader class, such as,

    public class LogFileRecordReader extends RecordReader<LongWritable, LogWritable>{...}
    
  • In above class override proper methods, such as

    public boolean nextKeyValue() throws IOException, InterruptedException {...}
    
    public LongWritable getCurrentKey() throws IOException, InterruptedException {...}
    
    public LogWritable getCurrentValue() throws IOException,InterruptedException {...}
    
    public float getProgress() throws IOException, InterruptedException {...}
    
    public void close() throws IOException {...}
    

To customize Outputformat, we need to do following

  • Create customized OutputFormat class by extending Hadoop inputformat with your own type of class, such as

    public class LogFileOutputFormat extends FileOutputFormat<LongWritable, LogWritable>{...}
    
  • In above class, override

    public RecordWriter getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException {...}
    
  • Create customized RecordWriter class by extending Hadoop RecordReader class, such as,

    extends RecordWriter<T,T>{...}
    
  • In above class override proper methods, such as

    public void write(TextArrayWritable key, NullWritable value) throws IOException, InterruptedException {...}
    
    public synchronized void close(TaskAttemptContext context) throws IOException {...}