archive: 2013

Hive Composite Data Type

For now, hive supports following composite data type: map: (key1, value1, key2, value2, …). Creates a map with the given key/value pairs struct: (val1, val2, val3, …). Creates a struct with the give

SQL in MySQL and Pig Comparision

Here, it is using Mysql 5.1.x and Pig 0.8 as sample. Two sample files are used as follows. 00.Prepared Filescat /tmp/data_file_1 zhangsan 23 1 lisi 24 1 wangmazi 30 1 meinv

Disable Major Compaction in HBase Cluster

HBase consists of multiple regions. While a region may have several Stores, each holds a single column family. An edit first writes to the hosting region store’s in-memory space, which is called MemS

Add Backup Master Node in HBase Cluster

How to add a backup master node to the cluster? There are two ways of doing that. In One Way Start the HBase master daemon on the backup master node: hadoop@master2$ $HBASE_HOME/bin/ st

Big Data Platform

Here I am comparing the most famous vendors who offer hadoop platform for enterprise Below is a typical vision of big data analytics architecture

Commonly Used Maven Plugins


Hadoop Counter

hadoop counter is to help developers and users to have overall status of running jobs. There are three type of counters, MapReduce related, File systems related, and job related. The details can be s

Hadoop DistributedCache

DistributedCache UsageThe usage of DistributedCache is as follows Share data files/meta data/binary files among map and reduce tasks Add 3rd party packages to the classpath DistributedCache APIBasic

Install Jekyll in MacOS

In this year, I have changed blog engines I used from to GitHub. The main reason is avaliability. The has so frequent downsite time particularlly when I start using it. For the other

Git Catchup Changes

There are following ways to catch up/revert changes in GIT Catchup changes from remote Pull out from remote again and you lost all of your local changes as well as hisory rm -Rf working_folders git c