tag : hadoop

Happy New Year 2016

It is the end of 2015 and HAPPY NEW YEAR - 2016. It is time to wrap up my writing calendar with some summary on Sparkera, myself, and Big Data ecosystem. In past 2015, I have published 21 articles in

Hadoop Streaming

1. Streaming OverviewHadoop Streaming is a generic API which allows writing Mappers and Reduces in any language. Develop MapReduce jobs in practically any language Uses Unix Streams as communicatio

Apache Hive Essentials Published

Finally, I made it. I got it published after working for 6 monthes. Apache Hive Essentials My very first book Also the first book on Apache Hive 1.0.0 in the world Check it out here

Hive and Hadoop Exceptions

I installed Hive 1.0.0 on Hadoop 1.2.1. When I try to enter the Hive CLI, it reports following exceptions 1org.apache.hadoop.hive.ql.metadata.HiveException:java.io.IOException:Filesystem closed Accor

Data Lake Stages

Edd has post a very impressive blog about how Hadoop ecosystem influence the data lake in enterprise recently. It discussed about the four following stages when enterprise’s data evolution to the dr

Moving to the Spark

It has been a while that the blog is now updated since 2014 is a ready busy year. After I almost completed my first book recently, I think it is the right time to start new journey in big data for rea

Hive and Hadoop Exceptions

I installed Hive 1.0.0 on Hadoop 1.2.1. When I try to enter the Hive CLI, it reports following exceptions 1org.apache.hadoop.hive.ql.metadata.HiveException:java.io.IOException:Filesystem closed Accor

Steps to setup EC2 cluster for Hadoop

Get the Access Key ID and Secret Access Key and store it in a notepad. The keys will be used when creating EC2 instances. If not there, then generate a new set of keys. Go to the PVC Management con

Hadoop Counter

hadoop counter is to help developers and users to have overall status of running jobs. There are three type of counters, MapReduce related, File systems related, and job related. The details can be s

Hadoop DistributedCache

DistributedCache UsageThe usage of DistributedCache is as follows Share data files/meta data/binary files among map and reduce tasks Add 3rd party packages to the classpath DistributedCache APIBasic