category : Blog

Hive Performance Tuning - No. of MapReduce

1. Set proper number of map Most of time, the job will generate one or multiple map task through number of input directories. There are factors, such as number of input files, the size of input files

Hive Regular Expression SerDe

Unless there are no way to user internal parser, I do not recommend write user defined SerDe. Do not forget that Hive comes with a contrib RegexSerDeclass, which can tokenize your logs/files to resol

Little About MapReduce Combiner

Combiner is used to reduce the number of split shuffling to reducer. It will improve the overall performance obviously. There are following two points to be attention of using it. Your map and reduc

What is Predictive Analytics

Predictive analytics can be broken down into three broad categories: Recommender, Classification, Clustering Recommender—Recommender systems suggest items based on past behavior or interest. These it