How does git track source code moved between files?

It doesn’t track them. That’s the beauty of it.

Git only records snapshots of the entire project tree: here’s what all files looked like before the commit and here’s how they look like after. How we got from here to there, Git doesn’t care.

This allows intelligent tools to be written after a commit has already happened, to extract information from that commit. For example, rename detection in Git is done by comparing all deleted files against all new files and comparing pairwise similarity metrics. If the similarity metric is greater than x, they are considered renamed, if it is between y and x (y < x), it is considered to be a rename+edit, and if it is below y, they are considered independent. The cool thing is that you, as a “commit archaeologist”, can specify after the fact, what x and y should be. This would not work if the commit simply recorded “this file is a rename of that file”.

Detecting moved content works similar: you slice every file into pieces, compute similarity metrics between all the slices and can then deduce that this slice which was deleted over here and this very similar slice which was added over there are actually the same slice that was moved from here to there.

However, as tonfa mentioned in his answer, this is very expensive, so it is not normally done. But it could be done, and that’s the point.

BTW: this is pretty much the exact opposite of the Operational Transformation model used by Google Wave, EtherPad, Gobby, SubEthaEdit, ACE and Co.

Leave a Comment