How does Git solve the merging problem? [closed]

Git will not prevent conflict in merges but can reconcile history even when they do not share any parent ancestor.
(through The grafts file (.git/info/grafts), which is a list, one per line, of a commit followed by its parents, that you can modify for that “reconciliation” purpose.)
So pretty powerful right there.

But to really have a glimpse on “how merges have been thought through”, you can start by turning to Linus himself, and realize this issue is not so much about “algorithm”:

Linus: Me personally, I want to have something that is very repeatable and non-clever. Something I understand or tells me that it can’t do it.
And quite frankly, merging single-file history without taking all the other files’ history into account makes me go “ugh”.

The important part of a merge is not how it handles conflicts (which need to be verified by a human anyway if they are at all interesting), but that it should meld the history together right so that you have a new solid base for future merges.

In other words, the important part is the trivial part: the naming of the parents, and keeping track of their relationship. Not the clashes.

And it looks like 99% of SCM people seem to think that the solution to that is to be more clever about content merges. Which misses the point entirely.


So Wincent Colaiuta adds (emphasis mine):

There is no need for fancy metadata, rename tracking and so forth.
The only thing you need to store is the state of the tree before and after each change.

What files were renamed? Which ones were copied? Which ones were deleted? What lines were added? Which ones were removed? Which lines had changes made inside them? Which slabs of text were copied from one file to another?
You shouldn’t have to care about any of these questions and you certainly shouldn’t have to keep special tracking data in order to help you answer them: all the changes to the tree (additions, deletes, renames, edits etc) are implicitly encoded in the delta between the two states of the tree; you just track what is the content.

Absolutely everything can (and should) be inferred.

Git breaks the mould because it thinks about content, not files.
It doesn’t track renames, it tracks content. And it does so at a whole-tree level.
This is a radical departure from most version control systems.
It doesn’t bother trying to store per-file histories; it instead stores the history at the tree level.
When you perform a diff you are comparing two trees, not two files.

The other fundamentally smart design decision is how Git does merges.
The merging algorithms are smart but they don’t try to be too smart. Unambiguous decisions are made automatically, but when there’s doubt it’s up to the user to decide.
This is the way it should be. You don’t want a machine making those decisions for you. You never will want it.
That’s the fundamental insight in the Git approach to merging: while every other version control system is trying to get smarter, Git is happily self-described as the “stupid content manager”, and it’s better for it.

Leave a Comment