archive: 2013/5

Hadoop DistributedCache

DistributedCache UsageThe usage of DistributedCache is as follows Share data files/meta data/binary files among map and reduce tasks Add 3rd party packages to the classpath DistributedCache APIBasic

Install Jekyll in MacOS

In this year, I have changed blog engines I used from blog.com to GitHub. The main reason is avaliability. The blog.com has so frequent downsite time particularlly when I start using it. For the other

Git Catchup Changes

There are following ways to catch up/revert changes in GIT Catchup changes from remote Pull out from remote again and you lost all of your local changes as well as hisory rm -Rf working_folders git c

GIT Tips At Weekend - Sunday

Cherry-Picking git cherry-pick [--edit] [-n] [-m parent-number] [-s] [-x] <commit> Selectively merge a single commit from another local branchExample: git cherry-pick 7300a6130d9447e18a931e898b

GIT Tips At Weekend - Saturday

Info git reflog Use this to recover from major fuck ups! It’s basically a log of thelast few actions and you might have luck and find old commits thathave been lost by doing a complex merge. git diff

GIT Tips At Weekend - Friday

Git Setup Git Clonegit clone [repo] clone the repository specified by [repo]. It defaully run git init and git remote add origin [repo] Add colors by setting ~/.gitconfig file:[color] ui = auto [colo

Hadoop Multiple Input and Output

The following is an example of using multiple inputs (org.apache.hadoop.mapreduce.lib.input.MultipleInputs) with different input formats and different mapper implementations. MultipleInputs.addInputP

Hadoop Customize Data Type

Customize Data Type - As ValueTo create a customized data type used as a value, the data type must implement the org.apache.hadoop.io.Writable interface which consists of the two methods, readFields(

Hadoop Balancer

Whenever the nodes are added to the cluster or lots of data are delete, we need to run Hadoop balancer to balance the data in the datenodes. Or else, the over utilized data nodes will become the bottl

Adding or Removing Hadoop Nodes

I am here to give step completely. I saw some version of step before, but many of them either are not complete or confused or wrong, e.g. someone even stop the cluster to do that! Adding Nodes In the