tag : hive

Commonly Used Hive Setting

I just list very commonly used ones. Set dynamic partition (a column name) SET hive.exec.dynamic.partition=true; Affect insert rows to save in sampled format SET hive.enforce.bucketing = true; Reduce

Hive Performance Tuning - No. of MapReduce

1. Set proper number of map Most of time, the job will generate one or multiple map task through number of input directories. There are factors, such as number of input files, the size of input files

Hive Regular Expression SerDe

Unless there are no way to user internal parser, I do not recommend write user defined SerDe. Do not forget that Hive comes with a contrib RegexSerDeclass, which can tokenize your logs/files to resol