Git Merge Strategies

Introduction

Imagine I have a master branch with one commit:

75eb1cb - (origin/master) README

This is a single README.md file with the following content:

- A: 1

Now imagine I have a branch from master called feat/foo and in that branch I’ve made 3 additional commits:

* 41d4115 - Add C (also revert A)
* 9e5626c - Modify A
* 8e7965e - Add B

The contents of the README.md file is now:

- A: 1
- B: 2
- C: 3

Just to quickly clarify, you’ll notice throughout this post that I use the command git lg which is actually an alias I have set within my ~/.gitconfig that uses git log but modifies its behaviour with some additional git flags:

log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr)%Creset' --abbrev-commit --date=relative

git merge

So git merge is the standard workhorse for merging branches in git. It’ll try to resolve the differences between the two branches the best way it can.

If the source branch feat/foo (the branch you want to merge from) can be merged cleanly (e.g. there are no major diverges from the destination branch master, which is the branch the changes are being merged into), then git will be able to perform a simple “fast-forward”.

What “fast-forward: means is that git will change the HEAD (on the destination branch) to point to the new latest commit, and all the other commits from your source branch will also appear in the git log/history of the destination branch.

Note: HEAD is an alias that points to a commit (typically HEAD is the latest commit in your branch). Even the branch name itself is an alias that refers to a commit (most things in git do simply resolve to commits). This is why when you have a long branch name, instead of git push origin really-long-branch-name you can just use git push origin head and git will figure out which branch you’re on

If you check git lg after doing a git merge feat/foo, you should see something like:

* 41d4115 - (HEAD -> master, origin/feat/foo, feat/foo) Add C (also revert A)
* 9e5626c - Modify A
* 8e7965e - Add B
* 75eb1cb - (origin/master) README

We can see all the commits from feat/foo were replayed onto master successfully.

Note: you might not realise that there is a short cut to checking out a branch and then merging another branch into it: git merge <source> <destination>, which is the same as doing git checkout <destination> followed by git merge <source>

git merge --no-ff --edit

Let’s say you wanted a “merge commit” to happen (i.e. merge commits typically only occur if there has been a divergence between the branches which means git has to resolve the problem for you), then you can force git to use a “merge commit” even when there is no need for one (as is the case for me here).

Using our previous example, which merged cleanly, let’s say that a merge commit is what we wanted to have happen. Assuming you’ve not pushed the branch to a remote, then you can safely go back to before the merge occurred using:

git reset --hard 75eb1cb

Note: 75eb1cb being my first commit in master

git reset

It’s important to understand how git reset works, as it has three flags and if not used correctly could have bad side effects. The flags are:

  • --soft
  • --mixed
  • --hard

The way reset works is that you use one of the above flags, followed by the commit you want to reset the HEAD back to. So in our case we used the commit 75eb1cb, which was our very first commit.

If I had used the --soft flag instead, then it would have reset the HEAD back to the first commit, but any other commits that happened since would have their changes staged together in our git ‘index’ waiting to be committed.

If I had used the --mixed flag instead, then it would have reset the HEAD back to the first commit, but any other commits that happened since would have their changes applied to the working directory, ready for us to choose which changes to be added to the index (i.e. staged) and then finally committed.

When using --hard though, any of the changes that came after the commit being reset to, are lost. They’re not sitting in your staging index, nor are they available within your working directory either.

So be careful whenever using the --hard flag.

Force the merge commit

Now we’re back to where we were originally (a separate feat/foo branch and a master branch with a single commit), we can look at how to force a merge commit.

To force a merge commit you’ll need to use the --no-ff flag and then also use the --edit flag to allow you to modify the default merge commit message (otherwise git will provide its own commit message which is nearly always not useful or descriptive):

git merge --edit --no-ff feat/foo 

Note: --edit doesn’t work without --no-ff, unless there is a genuine merge conflict

Now if I look at my git lg I can see:

*   97f1257 - (HEAD -> master) My custom merge commit message for 'feat/foo'
|\  
| * 41d4115 - (origin/feat/foo, feat/foo) Add C (also revert A)
| * 9e5626c - Modify A
| * 8e7965e - Add B
|/  
* 75eb1cb - (origin/master) README

We can see all the commits from feat/foo were replayed onto master successfully, but now you’re able to more easily distinguish the three commits came from another branch (if using my git lg alias). Which is one of the main reasons to force a merge commit using --no-ff as it really helps keep a varied branch history.

Notice git log will also show in its output for the merge commit
a field like Merge: 75eb1cb 8e7965e 9e5626c 41d4115
Which helps (at a glance) to know more about what commits are inside the merge commit

git branch --contains

The following command can be useful in locating where a commit has come from:

git branch --contains 9e5626c

In our case this will indicate that the commit we specified is part of our master branch. Now when you use --contains with a commit such as 9e5626c (which was merged in from our feature branch) you’ll see that git recognises this commit is part of multiple branches †.

† until you delete the branch (e.g. git branch -D feat/foo)

Losing useful history

It’s also worth mentioning, that even after the feat/foo branch has been deleted, git will still show (via git log --graph) those commits from our feat/foo branch as coming from an alternative path/branch history.

This is a useful bit of information that can be lost when using other tools such as git rebase or git merge --squash, so you should discuss with your team what type of information you feel is useful to have when you look back at a project’s git history before forging ahead with any one of the strategies I cover here.

For example, some teams don’t find being able to see that a set of commits actually came from another branch very useful: considering all commits/features should generally come in from separate branches/Pull Requests. So the use of rebase or squash isn’t a concern for them. For a team like this, an aesthetically ‘cleaner’ git commit history is preferred.

Also, in teams where I’ve worked and they’ve utilised a ‘squash’ strategy (see below for more details), we’ve used the following structure for our commit message so it’s clearer what’s been squashed:

Closes #123 - New Feature X

Squashed commit of the following:

commit c7e4145f6e95e51fcff79d6b3476bcb19c058071
commit 3275f1805c4f82298676aa3c61db8c65ee9f3428
commit bb50fb69c2d131d0126fa9ae018377e6451678e2
commit 7ceb49c352d812a91db0e87a8ed4c4cf426c0365
commit 86d1de3c5133a403edf45343081353055c02b454
commit 8f48e5b3c43acf71e8abab4b821cfdc66447b732
commit ed857784feff091ece52d906e311ef7f64a49c3d
commit a277e60c39333a55134c3e3ef6d97076f9bc8370
commit dd7e1973fe91f29887928aad9d991be24efb143a
commit ff7e7dabf745ac4d73b52644c3d29ea05d5c318f
commit 36f1c5bc5949f01117c1d57e6ab12f05c2a202f5

git merge --squash

So what if you don’t want all those commits in your master? You could instead “squash” all the commits down into a single commit using the --squash command:

git merge --squash feat/foo

Now what this does is take my changes from the source branch feat/foo and automatically squashes those separate commits into a single change that’s placed into the staging area of my destination branch.

These collection of changes now appear as a single change to the file. They aren’t actually merged yet. So you have the opportunity to change the commit message:

git commit -m "your own custom commit message"

git rebase

The git rebase feature in essence is solving the same problem as git merge (they both integrate a set of changes), but they do them in fundamentally different ways.

With git merge a merge commit is utilised to resolve conflicts and so is considered non-destructive. What this means is that the commits within either branch (destination or source) aren’t modified in any way.

With git rebase the source branch commits are placed before the destination branch’s commits, but also the source commits are recreated inside the destination branch.

Let’s look and see what this does for us:

git rebase feat/foo

We can see that as there were no conflicts, git was able to “fast-forward” the commits. So in theory this is no different right now from originally doing git merge feat/foo.

But what if master had a new change committed to it, and this change happened after we had branched off with feat/foo? For example, I’ll add a second commit to master that changes - A: 1 to - A: 9.

If I run git rebase feat/foo I should see we get a merge conflict and one that git doesn’t know how to resolve:

First, rewinding head to replay your work on top of it...
Applying: A to 9
Using index info to reconstruct a base tree...
M	README.md
Falling back to patching base and 3-way merge...
Auto-merging README.md
CONFLICT (content): Merge conflict in README.md
error: Failed to merge in the changes.
Patch failed at 0001 A to 9
The copy of the patch that failed is found in: .git/rebase-apply/patch

When you have resolved this problem, run "git rebase --continue".
If you prefer to skip this patch, run "git rebase --skip" instead.
To check out the original branch and stop rebasing, run "git rebase --abort".

We can see from the information git has given us that it first rewinded master back to the first commit 75eb1cb in order for it to place our feat/foo commits on top of it (as that initial commit is where our branch originally forked from).

From there we can see once git replayed our feat/foo commits on top of 75eb1cb that it then tried to apply the new commit that feat/foo didn’t have (e.g. Applying: A to 9) and it failed to do so.

Git tells us that there was a merge conflict:

CONFLICT (content): Merge conflict in README.md

It’s up to us to open README.md and to resolve the conflict ourself. When I open the file I see:

<<<<<<< 41d411564c1dc3106f03427d1b5920d05d95e037
- A: 1
- B: 2
- C: 3
||||||| merged common ancestors
- A: 1
=======
- A: 9
>>>>>>> A to 9

So the above shows the file is split into three:

  1. <<<<<<< <commit_hash>
  2. ||||||| merged common ancestors
  3. >>>>>>> <commit_message>

I know that I’m happy for the line - A: 1 (which was changed in my feat/foo branch commit 41d4115) to be changed to - A: 9 (which was changed in master after I originally branched from it). So I manually make that change by deleting all the added noise (e.g. <<<<<<< and ||||||| merged common ancestors etc) so I’m left with just the content the file should be expected to have now.

I update it to look like:

- A: 9
- B: 2
- C: 3

I now must run the following commands:

  • git add README.md (as I’ve made a change to the file at this point in time)
  • git rebase --continue

We see that git is trying again now to apply the commit (but this time there is no merge conflict info inside of the README) and so we see the output:

Applying: A to 9

Now when looking at the output from git lg I see:

* 7c001cd - (HEAD -> master) A to 9
* 41d4115 - (origin/feat/foo, feat/foo) Add C (also revert A)
* 9e5626c - Modify A
* 8e7965e - Add B
* 75eb1cb - (origin/master) README

This shows that the changes from feat/foo where replayed directly on top of 75eb1cb. Otherwise if we didn’t use git’s rebase feature but a standard git merge, we could’ve ended up with a git history that looked like the following:

* 41d4115 - (origin/feat/foo, feat/foo) Add C (also revert A)
* 9e5626c - Modify A
* 8e7965e - Add B
* 65553e0 - (HEAD -> master) A to 9
* 75eb1cb - (origin/master) README

Notice the feat/foo commits are on top of the A to 9 commit and that might not necessarily be what we want to have happen.

Note: it’s usually better to use git pull --rebase <remote> <branch> as this will ensure that you get the latest copy of changes for the specified branch (as apposed to git rebase <branch> which will just be the local copy of that branch (remember git pull is an abstraction on top of git fetch, then git merge).

git rebase --interactive

The --interactive flag is useful for letting us rewrite our git history. We’re able to move the order of our commits as well as squash commits down and change their recorded message.

So let’s assume we want to squash all but the first commit in our feat/foo branch. By that I mean we currently have:

* b4f9dfd - (HEAD -> feat/foo) Add C (also revert A)
* 7354a41 - Modify A
* c321b40 - Add B
* 75eb1cb - (origin/master) README

Let’s say we want “Add B”, “Modify A” and “Add C (also revert A)” squashed into one commit. To do this we need to locate the parent commit of the earliest commit we want to squash.

So “Add B” is the earliest commit we want as part of the squash, so the parent commit is “README”. To action the rebase let’s run the following command:

git rebase --interactive 75eb1cb

This drops us into an editor with the following output:

pick c321b40 Add B
pick 7354a41 Modify A
pick b4f9dfd Add C (also revert A)

# Rebase 75eb1cb..b4f9dfd onto 75eb1cb (3 command(s))
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
# d, drop = remove commit
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out

We can modify it like so:

pick c321b40 Add B
squash 7354a41 Modify A
squash b4f9dfd Add C (also revert A)

This will result in the following combined commit details:

# This is a combination of 3 commits.
# The first commit's message is:
Add B

# This is the 2nd commit message:

Modify A

# This is the 3rd commit message:

Add C (also revert A)

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# Date:      Sun May 15 17:29:32 2016 +0100
#
# interactive rebase in progress; onto 75eb1cb
# Last commands done (3 commands done):
#    squash 7354a41 Modify A
#    squash b4f9dfd Add C (also revert A)
# No commands remaining.
# You are currently editing a commit while rebasing branch 'feat/foo' on '75eb1cb'.
#
# Changes to be committed:
#	modified:   README.md
#

Now if we run git lg -p we’ll see the new squashed commit does indeed contain all the previous commit’s contents:

* b63857d - (HEAD -> feat/foo) Add B (16 minutes ago)| 
| diff --git a/README.md b/README.md
| index 428f59e..f2e26b6 100644
| --- a/README.md
| +++ b/README.md
| @@ -1 +1,3 @@
|  - A: 1
| +- B: 2
| +- C: 3

git rebase --onto

Imagine we’ve merged our feat/foo branch at this point into master using:

git merge --squash feat/foo

Note: you’ll need to fix a conflict first for it to be successful

So master should now have three commits:

* 19ec1bb - (HEAD -> master) Merge feat/foo
* 3fc460b - A to 9
* 75eb1cb - (origin/master) README

What’s the easiest way to delete the middle/second commit 3fc460b? We could use git rebase --interactive to delete the commit from history, but there is an alternative that’s much easier:

git rebase --onto 75eb1cb 3fc460b

Note: in this scenario you’ll get a conflict that you’ll need to resolve first (e.g. we’re removing a commit that sets A to the value 9 but that change was also pulled into the feat/foo branch so git isn’t sure whether you definitely want that change any more or not), but in most cases you’ll likely have a clean rebase

The basic structure of this command is:

git rebase --onto <commit_to_become_new_base> <commit_to_delete>

For more information see the documentation for git rebase:

git format-patch

At this point you’re likely using a service such as GitHub or GitLab for creating projects and opening pull requests, as apposed to Git’s own native pull request feature which is substantially less feature rich than these commercial abstraction layers.

But sometimes just accepting a ‘patch’ from someone and being able to apply it quickly and easily is what you want to do. So that’s where git format-patch comes in.

Imagine you have a centralised master branch and someone has branched off from its HEAD to a new branch called cool-new-features and they would like you to merge their changes directly with the centralised repository’s master branch.

This person would need to execute the following command:

git format-patch master

Note: you can swap the branch master for any valid commit, alias or range

What this will end up doing is generating a ‘patch’ file for each new commit that isn’t available in master. Below is an example patch file generated from a test repo I was messing around with, and which actually generated two patch files for me (this being the first one):

From 64a903d2ed6b4280d4a0914aaf50f014ae05cdd3 Mon Sep 17 00:00:00 2001
From: Integralist <mark.mcdx@gmail.com>
Date: Tue, 31 May 2016 08:28:56 +0100
Subject: [PATCH 1/2] G

---
 foo.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/foo.txt b/foo.txt
index b1e6722..6f04b1d 100644
--- a/foo.txt
+++ b/foo.txt
@@ -1,3 +1,4 @@
 A
 B
 C
+G
-- 
2.7.4

Note: if you want a single patch file you can use
the --stdout flag and redirect the output to a file
git format-patch master --stdout > new-feature.patch

The person who generates the patch file(s) will then need to send them to you (which can be done using git send-email):

git send-email -to devlist@example.org 0001-A.patch

If it’s you sending the patch via git, then you may need to configure git to use your mail server details:

git config --global sendemail.smtpserver smtp.my-isp.com
git config --global sendemail.smtpserverport 465

You, as the recipient of the patch file(s), can then review and apply the patch using:

git checkout review-new-feature
cat new-feature.patch | git am # single patch file
cat *.patch | git am           # multiple patch files

Also of interest, if using GitHub for Pull Requests, is that you can add a .patch extension to the end of a PR path or commit path for it to generate a patch for you! So you can utilise GitHub for some of the nice ‘review’ features, but then utilise classic/traditional communication and application of patches if you so choose (maybe for an older/internal system).

So if you have a GitHub PR URL like https://github.com/my-org/my-repo/pull/123, then you can convert this into a patch file using https://github.com/my-org/my-repo/pull/123.patch

Git also offers you the git apply command to use in place of git am. The reason being is that git am actually commits the changes in the patch, whereas git apply will only affect your working directory, so you’ll have the opportunity to stage and commit the changes however you like. Unless you use the --cached or --index flags (see man git-apply for details).

Note: git apply also has a --reverse flag to manipulate the order when applying multiple patchess

The other difference is that git am only accepts patch files, whereas git apply accepts patch files and also output from git diff. So you have more options available to you that way. For example:

curl https://gist.githubusercontent.com/anonymous/x/raw/x/test.diff | git apply

Conclusion

There are so many aspects to merging commits and dealing with git’s commit history, that it’s difficult to cover everything without people having to mentally store too much information that most of the time you wont utilise.

For example, I’ve not covered anything to do with pulling commits: git pull --strategy, git pull --squash, git pull --rebase, git pull --ff-only and git pull --no-commit. Each have their use cases, but I think sometimes you’re better picking a single strategy and defining it as a standard within your development team.

If you’re interested in one git workflow approach that utilises git’s rebasing feature, and I’ve used with success in the past at the BBC, then I recommend you have a read of this blog post I wrote a few years ago: integralist.co.uk/posts/github-workflow

I’ve also written about other types of git “workflows” as part of BBC News’ “Coding Best Practices” working group: github.com/bbc/news-coding-best-practices/git-workflow