archive: 2013/4

Useful Hadoop ToolRunner

Developers are pissed off with following things quite often: When you write job configuration in the code of map and reduce, you need to repack everything if there changes on paramenters You need to

Hive vs. Pig

Both projects are top Apache projects to process data in Hadoop. Here, I try to compare the difference. Below is picture I found (I cannot find the original link, but there is mirror here In addition,

When to Disable Speculative Execution

BackgroundsThis is the link from WikiMedia about what’s Speculative Execution. In Hadoop, the following parameters string are for this settings. And, they are true by default. mapred.map.tasks.specul

MRUnit for Now

Cloudera MRUnit will help with unit testing of mapreduce programming. Below is its support so far. The MapDriver and ReduceDriver support only a single key as input, which can make it more cumbersom