How can it be impossible to “decrypt” an MD5 hash? [duplicate]

Basically it’s because the output of MD5 contains less information than the input. This is basically what distinguishes a hash algorithm from an encryption algorithm.

Here’s a simple example: imagine an algorithm to compute the hash of a 10-digit number. The algorithm is “return the last 2 digits.” If I take the hash of 8023798734, I get 34, but if all you had is the 34, you would have no way to tell what the original number is because the hashing algorithm discarded 8 digits worth of information. It’s similar with MD5, except that the hash is computed via a complex procedure instead of just chopping off part of the data.

So then how can one hash be better than another? For one thing, different hash algorithms can be more or less resistant to collisions (when two inputs produce the same output). The probability of a collision is inversely related to the number of possible hash outputs. Collisions are an undesirable feature of hashes because if your data changes, you want the hash to change too, so one way to get a better hash algorithm is to use a hash with more possible outputs. In the digits example above, taking the last 4 digits instead of the last 2 digits reduces the probability of a collision with a given hash (technically called a preimage) to 1 in 10000 instead of 1 in 100, so it’s more likely that all the 10-digit numbers in whatever set you have will have different hash values.

There’s also the issue of cryptographic security. When you want to use a hash to make sure that some data is not tampered with, it’s desirable that whoever’s doing the tampering can’t predict what inputs will produce a given output. If they could, they would be able to alter the input data in such a way that the output (the hash) remains the same. Going back to the digits example again, let’s say I’m going to email you the number 1879483129 and it is critically important that this number gets to you unaltered. I might call you up and tell you the hash of the number, which would be 29, but since the “last 2 digits” algorithm is not cryptographically secure, a nefarious hacker could change the number en route to, say, 5555555529 and you wouldn’t know the difference.

It’s been shown that MD5 is not cryptographically secure (and SHA-1 is also compromised). That means that it is possible to find different inputs which correspond to any given output. It’s still a fine algorithm for protecting against random bit flips and the like, but if there’s a chance someone might want to intentionally corrupt your data, you should really use something more secure, like SHA-256 or greater, probably as part of an HMAC scheme.

Leave a Comment