Exclude characters from a character class

It really depends on your regex flavor.

.NET

… provides only one simple character class set operation: subtraction. This is enough for your example, so you can simply use

[\w-[_]]

If a - is followed by a nested character class, it’s subtracted. Simple as that…

Java

… provides a much richer set of character class set operations. In particular you can get the intersection of two sets like [[abc]&&[cde]] (which would give c in this case). Intersection and negation together give you subtraction:

[\w&&[^_]]

Perl

… supports set operations on extended character classes as an experimental feature (available since Perl 5.18). In particular, you can directly subtract arbitrary character classes:

(?[ \w - [_] ])

All other flavors

… (that support lookaheads) allow you to mimic the subtraction by using a negative lookahead:

(?!_)\w

This first checks that the next character is not a _ and then matches any \w (which can’t be _ due to the negative lookahead).

Note that each of these approaches is completely general in that you can subtract two arbitrarily complex character classes.

Leave a Comment