Hive unable to manually set number of reducers

writing query in hive like this:

 SELECT COUNT(DISTINCT id) ....

will always result in using only one reducer.
You should:

  1. use this command to set desired number of reducers:

    set mapred.reduce.tasks=50

  2. rewrite query as following:

SELECT COUNT(*) FROM ( SELECT DISTINCT id FROM … ) t;

This will result in 2 map+reduce jobs instead of one, but performance gain will be substantial.

Leave a Comment