Hadoop Customize Data Type

Customize Data Type - As Value

To create a customized data type used as a value, the data type must implement the org.apache.hadoop.io.Writable interface which consists of the two methods, readFields() and write()

Note:

  • In case you are adding a custom constructor to your custom Writable class, make sure to retain the default empty constructor.
  • TextOutputFormat uses the toString() method to serialize the key and value types. In case you are using the TextOutputFormat to serialize instances of your custom Writable type, make sure to have a meaningful toString() implementation for your custom Writable data type.
  • While reading the input data, Hadoop may reuse an instance of the Writable class repeatedly. You should not rely on the existing state of the object when populating it inside the readFields() method.

Customize Data Type - As Key

To create a customized data type used as a key, we need to implement the org.apache.hadoop.io.WritableComparable; interface and override following methods.

Customize Data Type - As Value With Different Data Types

To create a customized data type wrapping different types of data together used as a value, we need to implement the org.apache.hadoop.io.GenericWritable interface and override following methods

  • protected Class[] getTypes() : Return warped list of Hadoop type classes