A Regex to match a SHA1

You can consider the SHA1 hashes to be completely random, so this reduces to a matter of probabilities. The probability that a given digit is not a number is 6/16, or 0.375. The probability that three SHA1 digits are all not numbers is 0.375 ** 3, or 0.0527 (5% ish). At six digits, this reduces again to 0.00278 (0.2%). At five digits, the probability of all letters drops below 1% (you said you wanted to match 99% of the time).

It’s easy to craft a regular expression that always matches SHA1 values:

\b[0-9a-f]{5,40}\b

However, this may also match perfectly good five letter words, like “added” or “faded”. In my /usr/share/dict/words file, there are several six letter words that would match: “accede”, “beaded”, “bedded”, “decade”, “deface”, “efface”, and “facade” are the most likely. At seven letters, there is only “deedeed” which is unlikely to appear in prose. It all depends on how many false positives you can tolerate, and what the likely words you will encounter actually are.

Leave a Comment