Try it with another directory and then copy these files to that directory, while the job is running.
More Related Contents:
- How many files can I put in a directory?
- Spark Dataframe validating column names for parquet writes
- How can I update a broadcast variable in spark streaming?
- How to write spark streaming DF to Kafka topic
- Spark DataFrame: does groupBy after orderBy maintain that order?
- How to save/insert each DStream into a permanent table
- What are reserved filenames for various platforms?
- java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$ while running TwitterPopularTags
- spark broadcast variable Map giving null value
- Safe to have multiple processes writing to the same file at the same time? [CentOs 6, ext4]
- Amazon s3a returns 400 Bad Request with Spark
- How do filesystems handle concurrent read/write?
- Spark using python: How to resolve Stage x contains a task of very large size (xxx KB). The maximum recommended task size is 100 KB
- reading json file in pyspark
- How to calculate the size of dataframe in bytes in Spark?
- Reading data from Azure Blob with Spark
- How to optimize shuffle spill in Apache Spark application
- Does a join of co-partitioned RDDs cause a shuffle in Apache Spark?
- How do I use filesystem functions in PHP, using UTF-8 strings?
- How do I make my program watch for file modification in C++?
- Case class equality in Apache Spark
- How to overwrite the output directory in spark
- Is groupByKey ever preferred over reduceByKey
- Passing a data frame column and external list to udf under withColumn
- Pyspark : forward fill with last observation for a DataFrame
- Loop code for each file in a directory [duplicate]
- Pyspark replace strings in Spark dataframe column
- Pyspark: Parse a column of json strings
- How to aggregate over rolling time window with groups in Spark
- Find a file within all possible folders?