Spark iterate HDFS directory

You can use org.apache.hadoop.fs.FileSystem. Specifically, FileSystem.listFiles([path], true)

And with Spark…

FileSystem.get(sc.hadoopConfiguration).listFiles(..., true)

Edit

It’s worth noting that good practice is to get the FileSystem that is associated with the Path‘s scheme.

path.getFileSystem(sc.hadoopConfiguration).listFiles(path, true)

More Related Contents:

Spark on yarn concept understanding
Begenner at spark Big data programming (spark code)
Write to multiple outputs by key Spark – one Spark job
Spark – load CSV file as DataFrame?
How does Hadoop process records split across block boundaries?
While writing to hdfs path getting error java.io.IOException: Failed to rename
Read whole text files from a compression in Spark
How does Hadoop Namenode failover process works?
How to open/stream .zip files through Spark?
Name node is in safe mode. Not able to leave
How to fix corrupt HDFS FIles
Hadoop java.io.IOException: Mkdirs failed to create /some/path
Pyspark: get list of files/directories on HDFS path
java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StorageStatistics
How to access s3a:// files from Apache Spark?
Is it better to have one large parquet file or lots of smaller parquet files?
Apache Spark: The number of cores vs. the number of executors
Apache Hadoop Yarn – Underutilization of cores
Where are logs in Spark on YARN?
Namenode not getting started
Easiest way to install Python dependencies on Spark executor nodes?
Hive: Add partitions for existing folder structure
Default Namenode port of HDFS is 50070.But I have come across at some places 8020 or 9000 [closed]
How does Hadoop perform input splits?
Hadoop: …be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation
Default number of reducers
Difference between HBase and Hadoop/HDFS
Difference between hadoop fs -put and hadoop fs -copyFromLocal
Cannot Read a file from HDFS using Spark
Behavior of the parameter “mapred.min.split.size” in HDFS

More Related Contents:

Leave a Comment Cancel reply