Java – regular expression finding comments in code

You may have already given up on this by now but I was intrigued by the problem.

I believe this is a partial solution…

Native regex:

//.*|("(?:\\[^"]|\\"|.)*?")|(?s)/\*.*?\*/

In Java:

String clean = original.replaceAll( "//.*|(\"(?:\\\\[^\"]|\\\\\"|.)*?\")|(?s)/\\*.*?\\*/", "$1 " );

This appears to properly handle comments embedded in strings as well as properly escaped quotes inside strings. I threw a few things at it to check but not exhaustively.

There is one compromise in that all “” blocks in the code will end up with space after them. Keeping this simple and solving that problem would be very difficult given the need to cleanly handle:

int/* some comment */foo = 5;

A simple Matcher.find/appendReplacement loop could conditionally check for group(1) before replacing with a space and would only be a handful of lines of code. Still simpler than a full up parser maybe. (I could add the matcher loop too if anyone is interested.)

Leave a Comment