How to check encoding of a CSV file

You can use Notepad++ to evaluate a file’s encoding without needing to write code. The evaluated encoding of the open file will display on the bottom bar, far right side. The encodings supported can be seen by going to Settings -> Preferences -> New Document/Default Directory and looking in the drop down.

VBScript to loop through all files in a folder

Maybe this will clear things up. (Or confuse you more, ) Const ForReading = 1 Const ForWriting = 2 sFolder = “H:\Letter Display\Letters\” Set oFSO = CreateObject(“Scripting.FileSystemObject”) For Each oFile In oFSO.GetFolder(sFolder).Files If UCase(oFSO.GetExtensionName(oFile.Name)) = “LTR” Then ProcessFiles oFSO, oFile End if Next Set oFSO = Nothing Sub ProcessFiles(FSO, File) Set oFile2 = FSO.OpenTextFile(File.path, ForReading) … Read more

Parse a csv using awk and ignoring commas inside a field

gawk -vFPAT='[^,]*|”[^”]*”‘ ‘{print $1 “,” $3}’ | sort | uniq This is an awesome GNU Awk 4 extension, where you define a field pattern instead of a field-separator pattern. Does wonders for CSV. (docs) ETA (thanks mitchus): To remove the surrounding quotes, gsub(“^\”|\”$”,””,$3); if there’s more fields than just $3 to process that way, just … Read more

Reading csv files with quoted fields containing embedded commas

I noticed that your problematic line has escaping that uses double quotes themselves: “32 XIY “”W”” JK, RE LK” which should be interpreter just as 32 XIY “W” JK, RE LK As described in RFC-4180, page 2 – If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped … Read more

How to export a Hive table into a CSV file?

or use this hive -e ‘select * from your_Table’ | sed ‘s/[\t]/,/g’ > /home/yourfile.csv You can also specify property set hive.cli.print.header=true before the SELECT to ensure that header along with data is created and copied to file. For example: hive -e ‘set hive.cli.print.header=true; select * from your_Table’ | sed ‘s/[\t]/,/g’ > /home/yourfile.csv If you don’t … Read more

Spark dataframe save in single file on hdfs location [duplicate]

It’s not possible using standard spark library, but you can use Hadoop API for managing filesystem – save output in temporary directory and then move file to the requested path. For example (in pyspark): df.coalesce(1) \ .write.format(“com.databricks.spark.csv”) \ .option(“header”, “true”) \ .save(“mydata.csv-temp”) from py4j.java_gateway import java_import java_import(spark._jvm, ‘org.apache.hadoop.fs.Path’) fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration()) file = fs.globStatus(sc._jvm.Path(‘mydata.csv-temp/part*’))[0].getPath().getName() fs.rename(sc._jvm.Path(‘mydata.csv-temp/’ … Read more

Can you encode CR/LF in into CSV files?

Yes, you need to wrap in quotes: “some value over two lines”,some other value From this document, which is the generally-accepted CSV standard: Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes