Customize Data Type - As Value
To create a customized data type used as a value, the data type must implement the org.apache.hadoop.io.Writable interface which consists of the two methods, readFields() and write()
- void readFields(DataInput in)) : Deserialize the fields of this object from in.
- void write(DataOutput out) : Serialize the fields of this object to ‘out’.
- In case you are adding a custom constructor to your custom Writable class, make sure to retain the default empty constructor.
- TextOutputFormat uses the toString() method to serialize the key and value types. In case you are using the TextOutputFormat to serialize instances of your custom Writable type, make sure to have a meaningful toString() implementation for your custom Writable data type.
- While reading the input data, Hadoop may reuse an instance of the Writable class repeatedly. You should not rely on the existing state of the object when populating it inside the readFields() method.
Customize Data Type - As Key
To create a customized data type used as a key, we need to implement the org.apache.hadoop.io.WritableComparable
- void readFields(DataInput in) : Deserialize the fields of this object from ‘in’
- void write(DataOutput out) : Serialize the fields of this object to ‘out’
- int compareTo(T o)) : Compares this object with the specified object for order.
- int hashCode() : Used by Partitioner to decide which reducer to send the keys
Customize Data Type - As Value With Different Data Types
To create a customized data type wrapping different types of data together used as a value, we need to implement the org.apache.hadoop.io.GenericWritable interface and override following methods
- protected Class getTypes() : Return warped list of Hadoop type classes