What exactly does git’s “rebase –preserve-merges” do (and why?)

Question

As with a normal git rebase, git with --preserve-merges first identifies a list of commits made in one part of the commit graph, and then replays those commits on top of another part. The differences with --preserve-merges concern which commits are selected for replay and how that replaying works for merge commits.

To be more explicit about the main differences between normal and merge-preserving rebase:

Merge-preserving rebase is willing to replay (some) merge commits, whereas normal rebase completely ignores merge commits.
Because it’s willing to replay merge commits, merge-preserving rebase has to define what it means to replay a merge commit, and deal with some extra wrinkles
- The most interesting part, conceptually, is perhaps in picking what the new commit’s merge parents should be.
- Replaying merge commits also require explicitly checking out particular commits (git checkout <desired first parent>), whereas normal rebase doesn’t have to worry about that.
Merge-preserving rebase considers a shallower set of commits for replay:
- In particular, it will only consider replaying commits made since the most recent merge base(s) — i.e. the most recent time the two branches diverged –, whereas normal rebase might replay commits going back to the first time the two branches diverged.
- To be provisional and unclear, I believe this is ultimately a means to screen out replaying “old commits” that have already been “incorporated into” a merge commit.

First I will try to describe “sufficiently exactly” what rebase --preserve-merges does, and then there will be some examples. One can of course start with the examples, if that seems more useful.

The Algorithm in “Brief”

If you want to really get into the weeds, download the git source and explore the file git-rebase--interactive.sh. (Rebase is not part of Git’s C core, but rather is written in bash. And, behind the scenes, it shares code with “interactive rebase”.)

But here I will sketch what I think is the essence of it. In order to reduce the number of things to think about, I have taken a few liberties. (e.g. I don’t try to capture with 100% accuracy the precise order in which computations take place, and ignore some less central-seeming topics, e.g. what to do about commits that have already been cherry-picked between branches).

First, note that a non-merge-preserving rebase is rather simple. It’s more or less:

Find all commits on B but not on A ("git log A..B")
Reset B to A ("git reset --hard A") 
Replay all those commits onto B one at a time in order.

Rebase --preserve-merges is comparatively complicated. Here’s as simple as I’ve been able to make it without losing things that seem pretty important:

Find the commits to replay:
  First find the merge-base(s) of A and B (i.e. the most recent common ancestor(s))
    This (these) merge base(s) will serve as a root/boundary for the rebase.
    In particular, we'll take its (their) descendants and replay them on top of new parents
  Now we can define C, the set of commits to replay. In particular, it's those commits:
    1) reachable from B but not A (as in a normal rebase), and ALSO
    2) descendants of the merge base(s)
  If we ignore cherry-picks and other cleverness preserve-merges does, it's more or less:
    git log A..B --not $(git merge-base --all A B)
Replay the commits:
  Create a branch B_new, on which to replay our commits.
  Switch to B_new (i.e. "git checkout B_new")
  Proceeding parents-before-children (--topo-order), replay each commit c in C on top of B_new:
    If it's a non-merge commit, cherry-pick as usual (i.e. "git cherry-pick c")
    Otherwise it's a merge commit, and we'll construct an "equivalent" merge commit c':
      To create a merge commit, its parents must exist and we must know what they are.
      So first, figure out which parents to use for c', by reference to the parents of c:
        For each parent p_i in parents_of(c):
          If p_i is one of the merge bases mentioned above:
            # p_i is one of the "boundary commits" that we no longer want to use as parents
            For the new commit's ith parent (p_i'), use the HEAD of B_new.
          Else if p_i is one of the commits being rewritten (i.e. if p_i is in R):
            # Note: Because we're moving parents-before-children, a rewritten version
            # of p_i must already exist. So reuse it:
            For the new commit's ith parent (p_i'), use the rewritten version of p_i.
          Otherwise:
            # p_i is one of the commits that's *not* slated for rewrite. So don't rewrite it
            For the new commit's ith parent (p_i'), use p_i, i.e. the old commit's ith parent.
      Second, actually create the new commit c':
        Go to p_1'. (i.e. "git checkout p_1'", p_1' being the "first parent" we want for our new commit)
        Merge in the other parent(s):
          For a typical two-parent merge, it's just "git merge p_2'".
          For an octopus merge, it's "git merge p_2' p_3' p_4' ...".
        Switch (i.e. "git reset") B_new to the current commit (i.e. HEAD), if it's not already there
  Change the label B to apply to this new branch, rather than the old one. (i.e. "git reset --hard B")

Rebase with an --onto C argument should be very similar. Just instead of starting commit playback at the HEAD of B, you start commit playback at the HEAD of C instead. (And use C_new instead of B_new.)

Example 1

For example, take commit graph

  B---C <-- master
 /                     
A-------D------E----m----H <-- topic
         \         /
          F-------G

m is a merge commit with parents E and G.

Suppose we rebased topic (H) on top of master (C) using a normal, non-merge-preserving
rebase. (For example, checkout topic; rebase master.) In that case, git would select
the following commits for replay:

pick D
pick E
pick F
pick G
pick H

and then update the commit graph like so:

  B---C <-- master
 /     \                
A       D'---E'---F'---G'---H' <-- topic

(D’ is the replayed equivalent of D, etc..)

Note that merge commit m is not selected for replay.

If we instead did a --preserve-merges rebase of H on top of C. (For example, checkout topic; rebase –preserve-merges master.) In this new case, git would select the following commits for replay:

pick D
pick E
pick F (onto D’ in the ‘subtopic’ branch)
pick G (onto F’ in the ‘subtopic’ branch)
pick Merge branch ‘subtopic’ into topic
pick H

Now m was chosen for replay. Also note that merge parents E and G were
picked for inclusion before merge commit m.

Here is the resulting commit graph:

 B---C <-- master
/     \                
A      D'-----E'----m'----H' <-- topic
        \          / 
         F'-------G'

Again, D’ is a cherry-picked (i.e. recreated) version of D. Same for E’, etc.. Every commit not on master has been replayed. Both E and G (the merge parents of m) have been recreated as E’ and G’ to serve as the parents of m’ (after rebase, the tree history still remains the same).

Example 2

Unlike with normal rebase, merge-preserving rebase can create multiple
children of the upstream head.

For example, consider:

  B---C <-- master
 /                     
A-------D------E---m----H <-- topic
 \                 |
  ------- F-----G--/

If we rebase H (topic) on top of C (master), then the commits chosen for rebase are:

pick D
pick E
pick F
pick G
pick m
pick H

And the result is like so:

  B---C  <-- master
 /    | \                
A     |  D'----E'---m'----H' <-- topic
       \            |
         F'----G'---/

Example 3

In the above examples, both the merge commit and its two parents are replayed commits, rather than the original parents that the original merge commit have. However, in other rebases a replayed merge commit can end up with parents that were already in the commit graph before the merge.

For example, consider:

  B--C---D <-- master
 /    \                
A---E--m------F <-- topic

If we rebase topic onto master (preserving merges), then the commits to replay will be

pick merge commit m
pick F

The rewritten commit graph will look like so:

                     B--C--D <-- master
                    /       \             
                   A-----E---m'--F'; <-- topic

Here replayed merge commit m’ gets parents that pre-existed in the commit graph, namely D (the HEAD of master) and E (one of the parents of the original merge commit m).

Example 4

Merge-preserving rebase can get confused in certain “empty commit” cases. At least this is true only some older versions of git (e.g. 1.7.8.)

Take this commit graph:

                   A--------B-----C-----m2---D <-- master
                    \        \         /
                      E--- F--\--G----/
                            \  \
                             ---m1--H <--topic

Note that both commit m1 and m2 should have incorporated all the changes from B and F.

If we try to do git rebase --preserve-merges of H (topic) onto D (master), then the following commits are chosen for replay:

pick m1
pick H

Note that the changes (B, F) united in m1 should already be incorporated into D. (Those changes should already be incorporated into m2, because m2 merges together the children of B and F.) Therefore, conceptually, replaying m1 on top of D should probably either be a no-op or create an empty commit (i.e. one where the diff between successive revisions is empty).

Instead, however, git may reject the attempt to replay m1 on top of D. You can get an error like so:

error: Commit 90caf85 is a merge but no -m option was given.
fatal: cherry-pick failed

It looks like one forgot to pass a flag to git, but the underlying problem is that git dislikes creating empty commits.

More Related Contents:

Leave a Comment Cancel reply