When is \G useful application in a regex?

\G is an anchor; it indicates where the match is forced to start. When \G is present, it can’t start matching at some arbitrary later point in the string; when \G is absent, it can.

It is most useful in parsing a string into discrete parts, where you don’t want to skip past other stuff. For instance:

my $string = " a 1 # ";
while () {
    if ( $string =~ /\G\s+/gc ) {
        print "whitespace\n";
    }
    elsif ( $string =~ /\G[0-9]+/gc ) {
        print "integer\n";
    }
    elsif ( $string =~ /\G\w+/gc ) {
        print "word\n";
    }
    else {
        print "done\n";
        last;
    }
}

Output with \G’s:

whitespace
word
whitespace
integer
whitespace
done

without:

whitespace
whitespace
whitespace
whitespace
done

Note that I am demonstrating using scalar-context /g matching, but \G applies equally to list context /g matching and in fact the above code is trivially modifiable to use that:

my $string = " a 1 # ";
my @matches = $string =~ /\G(?:(\s+)|([0-9]+)|(\w+))/g;
while ( my ($whitespace, $integer, $word) = splice @matches, 0, 3 ) {
    if ( defined $whitespace ) {
        print "whitespace\n";
    }
    elsif ( defined $integer ) {
        print "integer\n";
    }
    elsif ( defined $word ) {
        print "word\n";
    }
}

Leave a Comment