How to run a jar file in hadoop?

I was able to reproduce your problem. The problem is where you are creating the jar. Basically, the directory that you are packaging into the jar is confusing the jar file in locating the main class file. Instead if you try doing : /usr/lib/jvm/jdk1.7.0_07/bin/jar cf Dictionary.jar /home/hduser/dir/Dictionary.class i.e. package the class file specifically into the … Read more

Default number of reducers

How Many Reduces? ( From official documentation) The right number of reduces seems to be 0.95 or 1.75 multiplied by (no. of nodes) * (no. of maximum containers per node). With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish. With 1.75 the faster nodes will finish … Read more

What is the use of grouping comparator in hadoop map reduce

In support of the chosen answer I add: Following on from this explanation **Input**: symbol time price a 1 10 a 2 20 b 3 30 **Map output**: create composite key\values like so: > symbol-time time-price > >**a-1**         1-10 > >**a-2**         2-20 > >**b-3**         3-30 The Partitioner: will route the a-1 and a-2 keys to the same reducer … Read more

How to list all files in a directory and its subdirectories in hadoop hdfs

If you are using hadoop 2.* API there are more elegant solutions: Configuration conf = getConf(); Job job = Job.getInstance(conf); FileSystem fs = FileSystem.get(conf); //the second boolean parameter here sets the recursion to true RemoteIterator<LocatedFileStatus> fileStatusListIterator = fs.listFiles( new Path(“path/to/lib”), true); while(fileStatusListIterator.hasNext()){ LocatedFileStatus fileStatus = fileStatusListIterator.next(); //do stuff with the file like … job.addFileToClassPath(fileStatus.getPath()); }

Hadoop: …be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation

This error is caused by the block replication system of HDFS since it could not manage to make any copies of a specific block within the focused file. Common reasons of that: Only a NameNode instance is running and it’s not in safe-mode There is no DataNode instances up and running, or some are dead. … Read more