split a comma-separated string with both quoted and unquoted strings [duplicate]

Depending on your needs you may not be able to use a csv parser, and may in fact want to re-invent the wheel!!

You can do so with some simple regex

(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)

This will do the following:

(?:^|,) = Match expression “Beginning of line or string ,

(\"(?:[^\"]+|\"\")*\"|[^,]*) = A numbered capture group, this will select between 2 alternatives:

  1. stuff in quotes
  2. stuff between commas

This should give you the output you are looking for.

Example code in C#

 static Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);

public static string[] SplitCSV(string input)
{

  List<string> list = new List<string>();
  string curr = null;
  foreach (Match match in csvSplit.Matches(input))
  {        
    curr = match.Value;
    if (0 == curr.Length)
    {
      list.Add("");
    }

    list.Add(curr.TrimStart(','));
  }

  return list.ToArray();
}

private void button1_Click(object sender, RoutedEventArgs e)
{
    Console.WriteLine(SplitCSV("111,222,\"33,44,55\",666,\"77,88\",\"99\""));
}

Warning As per @MrE’s comment – if a rogue new line character appears in a badly formed csv file and you end up with an uneven (“string) you’ll get catastrophic backtracking (https://www.regular-expressions.info/catastrophic.html) in your regex and your system will likely crash (like our production system did). Can easily be replicated in Visual Studio and as I’ve discovered will crash it. A simple try/catch will not trap this issue either.

You should use:

(?:^|,)(\"(?:[^\"])*\"|[^,]*)

instead

Leave a Comment