Home

Apache Kafka Overview

The big data processing started by focusing on the batch processing. Distributed data storage and querying tools like MapReduce, Hive, and Pig were all designed to process data in batches rather than

Apache HAWQ is Landing on HDP

NewsLast week, HDP had announced to expend their strategic relationship with Pivotal. This will bring together Hortonworks’ expertise and support for data management and processing with Pivotal’s top

Scala Apply Method

The apply methods in scala has a nice syntactic sugar. It allows us to define semantics like java array access for an arbitrary class. For example, we create a class of RiceCooker and its method cook

Develop Spark WordCount

It is quite often to setup Apache Spark development environment through IDE. Since I do not cover much setup IDE details in my Spark course, I am here to give detail steps for developing the well know

Partially Applied Functions and Curry

Partially Applied FunctionsWhen you invoke a scala function, you need to apply the arguments to the function. If you pass all the expected arguments, you say the function is fully applied. If you only

Scala Call by Value vs. Name

From today, I start working on series of articles about how Scala is special and powerful than regular programming languages, such as Java, under tag scalatips. If you have any confused topic, please

Use UDF in Spark DataFrame

It is very convenient to create, register, and use user define functions with data. In addition, the recent release of Apache Spark also supports writing user-defined aggregation functions UDAF. Belo

Happy New Year 2016

It is the end of 2015 and HAPPY NEW YEAR - 2016. It is time to wrap up my writing calendar with some summary on Sparkera, myself, and Big Data ecosystem. In past 2015, I have published 21 articles in

Build Big Data Warehouse With Apache Hive

Ten tools for ten big data areas 04_Apache Hive from Will Du Above presentation is the fourth topic I have covered for series of talks about the Ten Tools for Ten Big Data Areas. Apache Hive is

Light Big Data With Apache Spark

Ten tools for ten big data areas 03_Apache Spark from Will Du Above presentation is the third topic I have covered for series of talks about the Ten Tools for Ten Big Data Areas. Apache Spark is