Hadoop java.io.IOException: Mkdirs failed to create /some/path

Just ran into this problem running mahout from CDH4 in standalone mode in my MacBook Air. The issue is that a /tmp/hadoop-xxx/xxx/LICENSE file and a /tmp/hadoop-xxx/xxx/license directory are being created on a case-insensitive file system when unjarring the mahout jobs. I was able to workaround this by deleting META-INF/LICENSE from the jar file like this: … Read more

There are 0 datanode(s) running and no node(s) are excluded in this operation

Two things worked for me, STEP 1 : stop hadoop and clean temp files from hduser sudo rm -R /tmp/* also, you may need to delete and recreate /app/hadoop/tmp (mostly when I change hadoop version from 2.2.0 to 2.7.0) sudo rm -r /app/hadoop/tmp sudo mkdir -p /app/hadoop/tmp sudo chown hduser:hadoop /app/hadoop/tmp sudo chmod 750 /app/hadoop/tmp … Read more

Python read file as stream from HDFS

You want xreadlines, it reads lines from a file without loading the whole file into memory. Edit: Now I see your question, you just need to get the stdout pipe from your Popen object: cat = subprocess.Popen([“hadoop”, “fs”, “-cat”, “/path/to/myfile”], stdout=subprocess.PIPE) for line in cat.stdout: print line

How to fix corrupt HDFS FIles

You can use hdfs fsck / to determine which files are having problems. Look through the output for missing or corrupt blocks (ignore under-replicated blocks for now). This command is really verbose especially on a large HDFS filesystem so I normally get down to the meaningful output with hdfs fsck / | egrep -v ‘^\.+$’ … Read more

Name node is in safe mode. Not able to leave

In order to forcefully let the namenode leave safemode, following command should be executed: bin/hadoop dfsadmin -safemode leave You are getting Unknown command error for your command as -safemode isn’t a sub-command for hadoop fs, but it is of hadoop dfsadmin. Also after the above command, I would suggest you to once run hadoop fsck … Read more

Write a file in hdfs with Java

an alternative to @Tariq’s asnwer you could pass the URI when getting the filesystem import org.apache.hadoop.fs.FileSystem import org.apache.hadoop.conf.Configuration import java.net.URI import org.apache.hadoop.fs.Path import org.apache.hadoop.util.Progressable import java.io.BufferedWriter import java.io.OutputStreamWriter Configuration configuration = new Configuration(); FileSystem hdfs = FileSystem.get( new URI( “hdfs://localhost:54310” ), configuration ); Path file = new Path(“hdfs://localhost:54310/s2013/batch/table.html”); if ( hdfs.exists( file )) { hdfs.delete( … Read more