Escaping separator within double quotes, in awk

It’s easy, with GNU awk 4:

zsh-4.3.12[t]% awk '{ 
 for (i = 0; ++i <= NF;)
   printf "field %d => %s\n", i, $i
 }' FPAT='([^,]+)|("[^"]+")' infile
field 1 => filed1
field 2 => filed2
field 3 => field3
field 4 => "field4,FOO,BAR"
field 5 => field5

Adding some comments as per OP requirement.

From the GNU awk manual on “Defining fields by content:

The value of FPAT should be a string that provides a regular
expression. This regular expression describes the contents of each
field. In the case of CSV data as presented above, each field is
either “anything that is not a comma,” or “a double quote, anything
that is not a double quote, and a closing double quote.” If written as
a regular expression constant, we would have /([^,]+)|("[^"]+")/. Writing this as a string
requires us to escape the double quotes, leading to:

FPAT = "([^,]+)|(\"[^\"]+\")"

Using + twice, this does not work properly for empty fields, but it can be fixed as well:

As written, the regexp used for FPAT requires that each field contain at least one character. A straightforward modification (changing the first ‘+’ to ‘*’) allows fields to be empty:

FPAT = "([^,]*)|(\"[^\"]+\")"

Leave a Comment