Default Namenode port of HDFS is 50070.But I have come across at some places 8020 or 9000 [closed]

The default Hadoop ports are as follows: (HTTP ports, they have WEB UI): Daemon Default Port Configuration Parameter ———————– ———— ———————————- Namenode 50070 dfs.http.address Datanodes 50075 dfs.datanode.http.address Secondarynamenode 50090 dfs.secondary.http.address Backup/Checkpoint node? 50105 dfs.backup.http.address Jobracker 50030 mapred.job.tracker.http.address Tasktrackers 50060 mapred.task.tracker.http.address Internally, Hadoop mostly uses Hadoop IPC, which stands for Inter Process Communicator, to communicate amongst … Read more

Spark iterate HDFS directory

You can use org.apache.hadoop.fs.FileSystem. Specifically, FileSystem.listFiles([path], true) And with Spark… FileSystem.get(sc.hadoopConfiguration).listFiles(…, true) Edit It’s worth noting that good practice is to get the FileSystem that is associated with the Path‘s scheme. path.getFileSystem(sc.hadoopConfiguration).listFiles(path, true)

Namenode not getting started

I was facing the issue of namenode not starting. I found a solution using following: first delete all contents from temporary folder: rm -Rf <tmp dir> (my was /usr/local/hadoop/tmp) format the namenode: bin/hadoop namenode -format start all processes again:bin/start-all.sh You may consider rolling back as well using checkpoint (if you had it enabled).

Amazon s3a returns 400 Bad Request with Spark

This message correspond to something like “bad endpoint” or bad signature version support. like seen here frankfurt is the only one that not support signature version 2. And it’s the one I picked. Of course after all my reserch can’t say what is signature version, it’s not obvious in the documentation. But the V2 seems … Read more