archive: 2012/6

Hive Performance Tuning - No. of MapReduce

1. Set proper number of map Most of time, the job will generate one or multiple map task through number of input directories. There are factors, such as number of input files, the size of input files

Hive Regular Expression SerDe

Unless there are no way to user internal parser, I do not recommend write user defined SerDe. Do not forget that Hive comes with a contrib RegexSerDeclass, which can tokenize your logs/files to resol