Definition:
Hashing is the application of a function f()
to a variable sized input to produce a constant sized output.
A => f() => X
B => f() => Y
C => f() => Z
A hash is also a one-way function which means that there isn’t a function to reverse or undo a hash. As well re-applying the hash f(f(x))
isn’t going to product x
again.
The Details:
A hash function can be as simple as “add 13 to the input” or complex like a Cryptographic Hash such as MD5 or SHA1. There are many things that constitute a good hash function like:
- Low Cost: Easy to compute
- Deterministic: if I hash the input
a
multiple times, I am going to get the same output each time - Uniformity: The input will be evenly distributed among the possible outputs. This falls in line with something called the Pigeonhole Principle. Since there are a limited number of outputs we want
f()
to place those outputs evenly instead of in the same bucket. When two inputs compute to the same output this is known as a collision. It’s a good thing for a hash function to produce fewer collisions.
Hashing applied to Passwords:
The hashing of passwords is the same process as described above, however it comes with some special considerations. Many of the properties that make up a good hash function are not beneficial when it comes to passwords.
Take for example determinism, because hashes produce a deterministic result when two people use the same password the hash is going to look the same in the password store. This is a bad thing! However this is mitigated by something called a salt.
Uniformity on the other hand is beneficial because the desire is for the algorithm to limit collisions.
Because a hash is One-Way means the input cannot be determined from the output, which is why hashing is great for passwords!