In Git, what is the difference between long and short hashes?

To elaborate a bit more about why the short hash is useful, and why you often don’t need the long hash, it has to do with how Git stores things.

c26cf8af130955c5c67cfea96f9532680b963628 will be stored in one of two places. It could be in the file .git/objects/c2/6cf8af130955c5c67cfea96f9532680b963628. Note that the first two characters, c2, make up a directory and the rest is the filename. Since many filesystems don’t perform well when there’s too many files in one directory, this prevents any one directory from having too many files in it and keeps this little directory database efficient.

With just the short hash, c26cf8a, git can do the equivalent of .git/objects/c2/6cf8a* and that’s likely to be a single file. Since the objects are subdivided into subdirectories, there’s not too many filenames to look through to check if there’s more than one match.

c26cf8a alone contains enough possibilities, 16^7 or 2^28 or 268,435,456 that it’s very unlikely another commit will share that prefix.

Basically, Git uses the filesystem itself as a simple key/value store, and it can look up partial keys without having to scan the whole list of keys.

That’s one way to store objects. More and more, Git stores its objects in packfiles. It’s a very efficient way to store just the changes between files. From time to time, your Git repository will examine what’s in .git/objects and store just the differences in .git/objects/pack/pack-<checksum>.

That’s a binary format, I’m not going to get into it here, and I don’t understand it myself anyway. 🙂

More Related Contents:

Leave a Comment Cancel reply