Parsing CSV in java

String,split(“,”) isn’t likely to work.
It will split fields that have embedded commas (“Foo, Inc.”) even though they are a single field in the CSV line.

What if the company name is:
        Company, Inc.
or worse:
        Joe’s “Good, Fast, and Cheap” Food

According to Wikipedia:    (http://en.wikipedia.org/wiki/Comma-separated_values)

Fields with embedded commas must be enclosed within double-quote characters.

   1997,Ford,E350,"Super, luxurious truck"

Fields with embedded double-quote characters must be enclosed within double-quote characters, and each of the embedded double-quote characters must be represented by a pair of double-quote characters.

   1997,Ford,E350,"Super ""luxurious"" truck"

Even worse, quoted fields may have embedded line breaks (newlines; “\n”):

Fields with embedded line breaks must be enclosed within double-quote characters.

   1997,Ford,E350,"Go get one now  
   they are going fast"

This demonstrates the problem with String,split(“,”) parsing commas:

The CSV line is:

a,b,c,”Company, Inc.”, d, e,”Joe’s “”Good, Fast, and Cheap”” Food”, f, 10/11/2010,1/1/2011, g, h, i

// Test String.split(",") against CSV with
// embedded commas and embedded double-quotes in
// quoted text strings:
//
// Company names are:
//        Company, Inc.
//        Joe's "Good, Fast, and Cheap" Food
//
// Which should be formatted in a CSV file as:
//        "Company, Inc."
//        "Joe's ""Good, Fast, and Cheap"" Food"
//
//
public class TestSplit {
    public static void TestSplit(String s, String splitchar) {
        String[] split_s    = s.split(splitchar);

        for (String seg : split_s) {
            System.out.println(seg);
        }
    }


    public static void main(String[] args) {
        String csvLine = "a,b,c,\"Company, Inc.\", d,"
                            + " e,\"Joe's \"\"Good, Fast,"
                            + " and Cheap\"\" Food\", f,"
                            + " 10/11/2010,1/1/2011, h, i";

        System.out.println("CSV line is:\n" + csvLine + "\n\n");
        TestSplit(csvLine, ",");
    }
}

Produces the following:


D:\projects\TestSplit>javac TestSplit.java

D:\projects\TestSplit>java  TestSplit
CSV line is:
a,b,c,"Company, Inc.", d, e,"Joe's ""Good, Fast, and Cheap"" Food", f, 10/11/2010,1/1/2011, g, h, i


a
b
c
"Company
 Inc."
 d
 e
"Joe's ""Good
 Fast
 and Cheap"" Food"
 f
 10/11/2010
1/1/2011
 g
 h
 i

D:\projects\TestSplit>

Where that CSV line should be parsed as:


a
b
c
"Company, Inc."
 d
 e
"Joe's ""Good, Fast, and Cheap"" Food"
 f
 10/11/2010
1/1/2011
 g
 h
 i

Leave a Comment