archive: 2013

Hive Composite Data Type

For now, hive supports following composite data type: map: (key1, value1, key2, value2, …). Creates a map with the given key/value pairs struct: (val1, val2, val3, …). Creates a struct with the give

SQL in MySQL and Pig Comparision

Here, it is using Mysql 5.1.x and Pig 0.8 as sample. Two sample files are used as follows. 00.Prepared Filescat /tmp/data_file_1 zhangsan 23 1 lisi 24 1 wangmazi 30 1 meinv

Disable Major Compaction in HBase Cluster

HBase consists of multiple regions. While a region may have several Stores, each holds a single column family. An edit first writes to the hosting region store’s in-memory space, which is called MemS

Add Backup Master Node in HBase Cluster

How to add a backup master node to the cluster? There are two ways of doing that. In One Way Start the HBase master daemon on the backup master node: hadoop@master2$ $HBASE_HOME/bin/hbase-daemon.sh st

Big Data Platform

Here I am comparing the most famous vendors who offer hadoop platform for enterprise Below is a typical vision of big data analytics architecture

Commonly Used Maven Plugins

Backgrounds我们都知道Maven本质上是一个插件框架,它的核心并不执行任何具体的构建任务,所有这些任务都交给插件来完成,例如编译源代码是由maven-compiler-plugin完成的。进一步说,每个任务对应了一个插件目标(goal),每个插件会有一个或者多个目标,例如maven-compiler-plugin的compile目标用来编译位于src/main/java/目录下的主源码

Hadoop Counter

hadoop counter is to help developers and users to have overall status of running jobs. There are three type of counters, MapReduce related, File systems related, and job related. The details can be s

Hadoop DistributedCache

DistributedCache UsageThe usage of DistributedCache is as follows Share data files/meta data/binary files among map and reduce tasks Add 3rd party packages to the classpath DistributedCache APIBasic

Install Jekyll in MacOS

In this year, I have changed blog engines I used from blog.com to GitHub. The main reason is avaliability. The blog.com has so frequent downsite time particularlly when I start using it. For the other

Git Catchup Changes

There are following ways to catch up/revert changes in GIT Catchup changes from remote Pull out from remote again and you lost all of your local changes as well as hisory rm -Rf working_folders git c