Hive unable to manually set number of reducers

writing query in hive like this:

 SELECT COUNT(DISTINCT id) ....

will always result in using only one reducer.
You should:

use this command to set desired number of reducers:

set mapred.reduce.tasks=50
rewrite query as following:

SELECT COUNT(*) FROM ( SELECT DISTINCT id FROM … ) t;

This will result in 2 map+reduce jobs instead of one, but performance gain will be substantial.

More Related Contents:

how many mappers and reduces will get created for a partitoned table in hive
What is Hive: Return Code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
Can Hive recursively descend into subdirectories without partitions or editing hive-site.xml?
How does Hadoop process records split across block boundaries?
merge output files after reduce phase
What is the difference between partitioning and bucketing a table in Hive ?
Setting the number of map tasks and reduce tasks
Difference between Hive internal tables and external tables?
How to transpose/pivot data in hive?
Is it better to use the mapred or the mapreduce package to create a Hadoop Job?
Sqoop import : composite primary key and textual primary key
Chaining multiple MapReduce jobs in Hadoop
When do reduce tasks start in Hadoop?
How to delete and update a record in Hive
How to get the input file name in the mapper in a Hadoop program?
Hadoop speculative task execution
How to update table in Hive 0.13?
hadoop map reduce secondary sorting
Hive: Add partitions for existing folder structure
Where does hadoop mapreduce framework send my System.out.print() statements ? (stdout)
How to load data to hive from HDFS without removing the source file?
How does Hadoop perform input splits?
Hadoop input split size vs block size
Default number of reducers
Create HIVE Table with multi character delimiter
Job queue for Hive action in oozie
Difference between Pig and Hive? Why have both? [closed]
What is Google’s Dremel? How is it different from Mapreduce?
Hadoop WordCount example stuck at map 100% reduce 0%
Hive Data Retrieval Queries: Difference between CLUSTER BY, ORDER BY, and SORT BY

More Related Contents:

Leave a Comment Cancel reply