Hive Get the Max/Min Value Rows

Most of time, we need to find the max or min value of particular columns as well as other columns. For example, we have following employee table. 1234567891011> SELECT name,sex_age.sex AS sex,sex_

Moving to the Spark

It has been a while that the blog is now updated since 2014 is a ready busy year. After I almost completed my first book recently, I think it is the right time to start new journey in big data for rea

Hive and Hadoop Exceptions

I installed Hive 1.0.0 on Hadoop 1.2.1. When I try to enter the Hive CLI, it reports following exceptions 1org.apache.hadoop.hive.ql.metadata.HiveException:java.io.IOException:Filesystem closed Accor

Steps to setup EC2 cluster for Hadoop

Get the Access Key ID and Secret Access Key and store it in a notepad. The keys will be used when creating EC2 instances. If not there, then generate a new set of keys. Go to the PVC Management con

Hive Composite Data Type

For now, hive supports following composite data type: map: (key1, value1, key2, value2, …). Creates a map with the given key/value pairs struct: (val1, val2, val3, …). Creates a struct with the give

SQL in MySQL and Pig Comparision

Here, it is using Mysql 5.1.x and Pig 0.8 as sample. Two sample files are used as follows. 00.Prepared Filescat /tmp/data_file_1 zhangsan 23 1 lisi 24 1 wangmazi 30 1 meinv

Disable Major Compaction in HBase Cluster

HBase consists of multiple regions. While a region may have several Stores, each holds a single column family. An edit first writes to the hosting region store’s in-memory space, which is called MemS

Add Backup Master Node in HBase Cluster

How to add a backup master node to the cluster? There are two ways of doing that. In One Way Start the HBase master daemon on the backup master node: hadoop@master2$ $HBASE_HOME/bin/hbase-daemon.sh st

Big Data Platform

Here I am comparing the most famous vendors who offer hadoop platform for enterprise Below is a typical vision of big data analytics architecture