Split a String at every 3rd comma in Java

NOTE: while solution using split may work (last test on Java 17) it is based on bug since look-ahead in Java should have obvious maximum length. This limitation should theoretically prevent us from using + but somehow \G at start lets us use + here. In the future this bug may be fixed which means that split will stop working.

Safer approach would be using Matcher#find like

String data = "0,0,1,2,4,5,3,4,6";
Pattern p = Pattern.compile("\\d+,\\d+,\\d+");//no look-ahead needed
Matcher m = p.matcher(data);
List<String> parts = new ArrayList<>();
while(m.find()){
    parts.add(m.group());
}
String[] result = parts.toArray(new String[0]);

You can try to use split method with (?<=\\G\\d+,\\d+,\\d+), regex

Demo

String data = "0,0,1,2,4,5,3,4,6";
String[] array = data.split("(?<=\\G\\d+,\\d+,\\d+),"); //Magic :) 
// to reveal magic see explanation below answer
for(String s : array){
    System.out.println(s);
}

output:

0,0,1
2,4,5
3,4,6

Explanation

  • \\d means one digit, same as [0-9], like 0 or 3
  • \\d+ means one or more digits like 1 or 23
  • \\d+, means one or more digits with comma after it, like 1, or 234,
  • \\d+,\\d+,\\d+ will accept three numbers with commas between them like 12,3,456
  • \\G means last match, or if there is none (in case of first usage) start of the string
  • (?<=...), is positive look-behind which will match comma , that has also some string described in (?<=...) before it
  • (?<=\\G\\d+,\\d+,\\d+), so will try to find comma that has three numbers before it, and these numbers have aether start of the string before it (like ^0,0,1 in your example) or previously matched comma, like 2,4,5 and 3,4,6.

Also in case you want to use other characters then digits you can also use other set of characters like

  • \\w which will match alphabetic characters, digits and _
  • \\S everything that is not white space
  • [^,] everything that is not comma
  • … and so on. More info in Pattern documentation

By the way, this form will work with split on every 3rd, 5th, 7th, (and other odd numbers) comma, like split("(?<=\\G\\w+,\\w+,\\w+,\\w+,\\w+),") will split on every 5th comma.

To split on every 2nd, 4th, 6th, 8th (and rest of even numbers) comma you will need to replace + with {1,maxLengthOfNumber} like split("(?<=\\G\\w{1,3},\\w{1,3},\\w{1,3},\\w{1,3}),") to split on every 4th comma when numbers can have max 3 digits (0, 00, 12, 000, 123, 412, 999).

To split on every 2nd comma you can also use this regex split("(?<!\\G\\d+),") based on my previous answer

Leave a Comment