How does grep run so fast?

Assuming your question regards GNU grep specifically. Here’s a note from the author, Mike Haertel:

GNU grep is fast because it AVOIDS LOOKING AT EVERY INPUT BYTE.

GNU grep is fast because it EXECUTES VERY FEW INSTRUCTIONS FOR EACH
BYTE that it
does look at.

GNU grep uses the well-known Boyer-Moore algorithm, which looks first
for the final letter of the target string, and uses a lookup table to
tell it how far ahead it can skip in the input whenever it finds a
non-matching character.

GNU grep also unrolls the inner loop of Boyer-Moore, and sets up the
Boyer-Moore delta table entries in such a way that it doesn’t need to
do the loop exit test at every unrolled step. The result of this is
that, in the limit, GNU grep averages fewer than 3 x86 instructions
executed for each input byte it actually looks at (and it skips many
bytes entirely).

GNU grep uses raw Unix input system calls and avoids copying data
after reading it. Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO
LINES. Looking for newlines would slow grep down by a factor of
several times, because to find the newlines it would have to look at
every byte!

So instead of using line-oriented input, GNU grep reads raw data into
a large buffer, searches the buffer using Boyer-Moore, and only when
it finds a match does it go and look for the bounding newlines
(Certain command line options like
-n disable this optimization.)

This answer is a subset of the information taken from here.

Leave a Comment