Write Spark dataframe as CSV with partitions

Spark 2.0.0+: Built-in csv format supports partitioning out of the box so you should be able to simply use: df.write.partitionBy(‘partition_date’).mode(mode).format(“csv”).save(path) without including any additional packages. Spark < 2.0.0: At this moment (v1.4.0) spark-csv doesn’t support partitionBy (see databricks/spark-csv#123) but you can adjust built-in sources to achieve what you want. You can try two different approaches. … Read more

CSV with comma or semicolon?

In Windows it is dependent on the “Regional and Language Options” customize screen where you find a List separator. This is the char Windows applications expect to be the CSV separator. Of course this only has effect in Windows applications, for example Excel will not automatically split data into columns if the file is not … Read more

Export as csv in beeline hive

When hive version is at least 0.11.0 you can execute: INSERT OVERWRITE LOCAL DIRECTORY ‘/tmp/directoryWhereToStoreData’ ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY “\n” SELECT * FROM yourTable; from hive/beeline to store the table into a directory on the local filesystem. Alternatively, with beeline, save your SELECT query in yourSQLFile.sql and run: beeline … Read more

How to check encoding of a CSV file

You can use Notepad++ to evaluate a file’s encoding without needing to write code. The evaluated encoding of the open file will display on the bottom bar, far right side. The encodings supported can be seen by going to Settings -> Preferences -> New Document/Default Directory and looking in the drop down.

VBScript to loop through all files in a folder

Maybe this will clear things up. (Or confuse you more, ) Const ForReading = 1 Const ForWriting = 2 sFolder = “H:\Letter Display\Letters\” Set oFSO = CreateObject(“Scripting.FileSystemObject”) For Each oFile In oFSO.GetFolder(sFolder).Files If UCase(oFSO.GetExtensionName(oFile.Name)) = “LTR” Then ProcessFiles oFSO, oFile End if Next Set oFSO = Nothing Sub ProcessFiles(FSO, File) Set oFile2 = FSO.OpenTextFile(File.path, ForReading) … Read more