Is there a way to include commas in CSV columns without breaking the formatting?

Enclose the field in quotes, e.g. field1_value,field2_value,”field 3,value”,field4, etc… See wikipedia. Updated: To encode a quote, use “, one double quote symbol in a field will be encoded as “”, and the whole field will become “”””. So if you see the following in e.g. Excel: ————————————— | regular_value |,,,”| ,””, |””” |”| ————————————— the … Read more

Dealing with commas in a CSV file

There’s actually a spec for CSV format, RFC 4180 and how to handle commas: Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. http://tools.ietf.org/html/rfc4180 So, to have values foo and bar,baz, you do this: foo,”bar,baz” Another important requirement to consider (also from the spec): If double-quotes are used to enclose … Read more

What’s the most robust way to efficiently parse CSV using awk?

If your CSV cannot contain newlines then all you need is (with GNU awk for FPAT): $ echo ‘foo,”field,””with””,commas”,bar’ | awk -v FPAT='[^,]*|(“([^”]|””)*”)’ ‘{for (i=1; i<=NF;i++) print i ” <” $i “>”}’ 1 <foo> 2 <“field,””with””,commas”> 3 <bar> or the equivalent using any awk: $ echo ‘foo,”field,””with””,commas”,bar’ | awk -v fpat=”[^,]*|(“([^”]|””)*”)” -v OFS=’,’ ‘{ rec … Read more