Git Merge Strategies
Introduction
Imagine I have a master
branch with one commit:
75eb1cb - (origin/master) README
This is a single README.md
file with the following content:
- A: 1
Now imagine I have a branch from master
called feat/foo
and in that branch I’ve made 3 additional commits:
* 41d4115 - Add C (also revert A)
* 9e5626c - Modify A
* 8e7965e - Add B
The contents of the README.md
file is now:
- A: 1
- B: 2
- C: 3
Just to quickly clarify, you’ll notice throughout this post that I use the command git lg
which is actually an alias I have set within my ~/.gitconfig
that uses git log
but modifies its behaviour with some additional git flags:
log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr)%Creset' --abbrev-commit --date=relative
git merge
So git merge
is the standard workhorse for merging branches in git. It’ll try to resolve the differences between the two branches the best way it can.
If the source branch feat/foo
(the branch you want to merge from) can be merged cleanly (e.g. there are no major diverges from the destination branch master
, which is the branch the changes are being merged into), then git will be able to perform a simple “fast-forward”.
What “fast-forward: means is that git will change the HEAD
(on the destination branch) to point to the new latest commit, and all the other commits from your source branch will also appear in the git log/history of the destination branch.
Note:
HEAD
is an alias that points to a commit (typicallyHEAD
is the latest commit in your branch). Even the branch name itself is an alias that refers to a commit (most things in git do simply resolve to commits). This is why when you have a long branch name, instead ofgit push origin really-long-branch-name
you can just usegit push origin head
and git will figure out which branch you’re on
If you check git lg
after doing a git merge feat/foo
, you should see something like:
* 41d4115 - (HEAD -> master, origin/feat/foo, feat/foo) Add C (also revert A)
* 9e5626c - Modify A
* 8e7965e - Add B
* 75eb1cb - (origin/master) README
We can see all the commits from feat/foo
were replayed onto master
successfully.
Note: you might not realise that there is a short cut to checking out a branch and then merging another branch into it:
git merge <source> <destination>
, which is the same as doinggit checkout <destination>
followed bygit merge <source>
git merge --no-ff --edit
Let’s say you wanted a “merge commit” to happen (i.e. merge commits typically only occur if there has been a divergence between the branches which means git has to resolve the problem for you), then you can force git to use a “merge commit” even when there is no need for one (as is the case for me here).
Using our previous example, which merged cleanly, let’s say that a merge commit is what we wanted to have happen. Assuming you’ve not pushed the branch to a remote, then you can safely go back to before the merge occurred using:
git reset --hard 75eb1cb
Note:
75eb1cb
being my first commit inmaster
git reset
It’s important to understand how git reset
works, as it has three flags and if not used correctly could have bad side effects. The flags are:
--soft
--mixed
--hard
The way reset works is that you use one of the above flags, followed by the commit you want to reset the HEAD
back to. So in our case we used the commit 75eb1cb
, which was our very first commit.
If I had used the --soft
flag instead, then it would have reset the HEAD
back to the first commit, but any other commits that happened since would have their changes staged together in our git ‘index’ waiting to be committed.
If I had used the --mixed
flag instead, then it would have reset the HEAD
back to the first commit, but any other commits that happened since would have their changes applied to the working directory, ready for us to choose which changes to be added to the index (i.e. staged) and then finally committed.
When using --hard
though, any of the changes that came after the commit being reset to, are lost. They’re not sitting in your staging index, nor are they available within your working directory either.
So be careful whenever using the --hard
flag.
Force the merge commit
Now we’re back to where we were originally (a separate feat/foo
branch and a master
branch with a single commit), we can look at how to force a merge commit.
To force a merge commit you’ll need to use the --no-ff
flag and then also use the --edit
flag to allow you to modify the default merge commit message (otherwise git will provide its own commit message which is nearly always not useful or descriptive):
git merge --edit --no-ff feat/foo
Note:
--edit
doesn’t work without--no-ff
, unless there is a genuine merge conflict
Now if I look at my git lg
I can see:
* 97f1257 - (HEAD -> master) My custom merge commit message for 'feat/foo'
|\
| * 41d4115 - (origin/feat/foo, feat/foo) Add C (also revert A)
| * 9e5626c - Modify A
| * 8e7965e - Add B
|/
* 75eb1cb - (origin/master) README
We can see all the commits from feat/foo
were replayed onto master
successfully, but now you’re able to more easily distinguish the three commits came from another branch (if using my git lg
alias). Which is one of the main reasons to force a merge commit using --no-ff
as it really helps keep a varied branch history.
Notice
git log
will also show in its output for the merge commit
a field likeMerge: 75eb1cb 8e7965e 9e5626c 41d4115
Which helps (at a glance) to know more about what commits are inside the merge commit
git branch --contains
The following command can be useful in locating where a commit has come from:
git branch --contains 9e5626c
In our case this will indicate that the commit we specified is part of our master
branch. Now when you use --contains
with a commit such as 9e5626c
(which was merged in from our feature branch) you’ll see that git recognises this commit is part of multiple branches †.
† until you delete the branch (e.g.
git branch -D feat/foo
)
Losing useful history
It’s also worth mentioning, that even after the feat/foo
branch has been deleted, git will still show (via git log --graph
) those commits from our feat/foo
branch as coming from an alternative path/branch history.
This is a useful bit of information that can be lost when using other tools such as git rebase
or git merge --squash
, so you should discuss with your team what type of information you feel is useful to have when you look back at a project’s git history before forging ahead with any one of the strategies I cover here.
For example, some teams don’t find being able to see that a set of commits actually came from another branch very useful: considering all commits/features should generally come in from separate branches/Pull Requests. So the use of rebase or squash isn’t a concern for them. For a team like this, an aesthetically ‘cleaner’ git commit history is preferred.
Also, in teams where I’ve worked and they’ve utilised a ‘squash’ strategy (see below for more details), we’ve used the following structure for our commit message so it’s clearer what’s been squashed:
Closes #123 - New Feature X
Squashed commit of the following:
commit c7e4145f6e95e51fcff79d6b3476bcb19c058071
commit 3275f1805c4f82298676aa3c61db8c65ee9f3428
commit bb50fb69c2d131d0126fa9ae018377e6451678e2
commit 7ceb49c352d812a91db0e87a8ed4c4cf426c0365
commit 86d1de3c5133a403edf45343081353055c02b454
commit 8f48e5b3c43acf71e8abab4b821cfdc66447b732
commit ed857784feff091ece52d906e311ef7f64a49c3d
commit a277e60c39333a55134c3e3ef6d97076f9bc8370
commit dd7e1973fe91f29887928aad9d991be24efb143a
commit ff7e7dabf745ac4d73b52644c3d29ea05d5c318f
commit 36f1c5bc5949f01117c1d57e6ab12f05c2a202f5
git merge --squash
So what if you don’t want all those commits in your master
? You could instead “squash” all the commits down into a single commit using the --squash
command:
git merge --squash feat/foo
Now what this does is take my changes from the source branch feat/foo
and automatically squashes those separate commits into a single change that’s placed into the staging area of my destination branch.
These collection of changes now appear as a single change to the file. They aren’t actually merged yet. So you have the opportunity to change the commit message:
git commit -m "your own custom commit message"
git rebase
The git rebase
feature in essence is solving the same problem as git merge
(they both integrate a set of changes), but they do them in fundamentally different ways.
With git merge
a merge commit is utilised to resolve conflicts and so is considered non-destructive. What this means is that the commits within either branch (destination or source) aren’t modified in any way.
With git rebase
the source branch commits are placed before the destination branch’s commits, but also the source commits are recreated inside the destination branch.
Let’s look and see what this does for us:
git rebase feat/foo
We can see that as there were no conflicts, git was able to “fast-forward” the commits. So in theory this is no different right now from originally doing git merge feat/foo
.
But what if master
had a new change committed to it, and this change happened after we had branched off with feat/foo
? For example, I’ll add a second commit to master
that changes - A: 1
to - A: 9
.
If I run git rebase feat/foo
I should see we get a merge conflict and one that git doesn’t know how to resolve:
First, rewinding head to replay your work on top of it...
Applying: A to 9
Using index info to reconstruct a base tree...
M README.md
Falling back to patching base and 3-way merge...
Auto-merging README.md
CONFLICT (content): Merge conflict in README.md
error: Failed to merge in the changes.
Patch failed at 0001 A to 9
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git rebase --continue".
If you prefer to skip this patch, run "git rebase --skip" instead.
To check out the original branch and stop rebasing, run "git rebase --abort".
We can see from the information git has given us that it first rewinded master
back to the first commit 75eb1cb
in order for it to place our feat/foo
commits on top of it (as that initial commit is where our branch originally forked from).
From there we can see once git replayed our feat/foo
commits on top of 75eb1cb
that it then tried to apply the new commit that feat/foo
didn’t have (e.g. Applying: A to 9
) and it failed to do so.
Git tells us that there was a merge conflict:
CONFLICT (content): Merge conflict in README.md
It’s up to us to open README.md
and to resolve the conflict ourself. When I open the file I see:
<<<<<<< 41d411564c1dc3106f03427d1b5920d05d95e037
- A: 1
- B: 2
- C: 3
||||||| merged common ancestors
- A: 1
=======
- A: 9
>>>>>>> A to 9
So the above shows the file is split into three:
<<<<<<< <commit_hash>
||||||| merged common ancestors
>>>>>>> <commit_message>
I know that I’m happy for the line - A: 1
(which was changed in my feat/foo
branch commit 41d4115
) to be changed to - A: 9
(which was changed in master
after I originally branched from it). So I manually make that change by deleting all the added noise (e.g. <<<<<<<
and ||||||| merged common ancestors
etc) so I’m left with just the content the file should be expected to have now.
I update it to look like:
- A: 9
- B: 2
- C: 3
I now must run the following commands:
git add README.md
(as I’ve made a change to the file at this point in time)git rebase --continue
We see that git is trying again now to apply the commit (but this time there is no merge conflict info inside of the README) and so we see the output:
Applying: A to 9
Now when looking at the output from git lg
I see:
* 7c001cd - (HEAD -> master) A to 9
* 41d4115 - (origin/feat/foo, feat/foo) Add C (also revert A)
* 9e5626c - Modify A
* 8e7965e - Add B
* 75eb1cb - (origin/master) README
This shows that the changes from feat/foo
where replayed directly on top of 75eb1cb
. Otherwise if we didn’t use git’s rebase feature but a standard git merge
, we could’ve ended up with a git history that looked like the following:
* 41d4115 - (origin/feat/foo, feat/foo) Add C (also revert A)
* 9e5626c - Modify A
* 8e7965e - Add B
* 65553e0 - (HEAD -> master) A to 9
* 75eb1cb - (origin/master) README
Notice the feat/foo
commits are on top of the A to 9
commit and that might not necessarily be what we want to have happen.
Note: it’s usually better to use
git pull --rebase <remote> <branch>
as this will ensure that you get the latest copy of changes for the specified branch (as apposed togit rebase <branch>
which will just be the local copy of that branch (remembergit pull
is an abstraction on top ofgit fetch
, thengit merge
).
git rebase --interactive
The --interactive
flag is useful for letting us rewrite our git history. We’re able to move the order of our commits as well as squash commits down and change their recorded message.
So let’s assume we want to squash all but the first commit in our feat/foo
branch. By that I mean we currently have:
* b4f9dfd - (HEAD -> feat/foo) Add C (also revert A)
* 7354a41 - Modify A
* c321b40 - Add B
* 75eb1cb - (origin/master) README
Let’s say we want “Add B”, “Modify A” and “Add C (also revert A)” squashed into one commit. To do this we need to locate the parent commit of the earliest commit we want to squash.
So “Add B” is the earliest commit we want as part of the squash, so the parent commit is “README”. To action the rebase let’s run the following command:
git rebase --interactive 75eb1cb
This drops us into an editor with the following output:
pick c321b40 Add B
pick 7354a41 Modify A
pick b4f9dfd Add C (also revert A)
# Rebase 75eb1cb..b4f9dfd onto 75eb1cb (3 command(s))
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
# d, drop = remove commit
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out
We can modify it like so:
pick c321b40 Add B
squash 7354a41 Modify A
squash b4f9dfd Add C (also revert A)
This will result in the following combined commit details:
# This is a combination of 3 commits.
# The first commit's message is:
Add B
# This is the 2nd commit message:
Modify A
# This is the 3rd commit message:
Add C (also revert A)
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# Date: Sun May 15 17:29:32 2016 +0100
#
# interactive rebase in progress; onto 75eb1cb
# Last commands done (3 commands done):
# squash 7354a41 Modify A
# squash b4f9dfd Add C (also revert A)
# No commands remaining.
# You are currently editing a commit while rebasing branch 'feat/foo' on '75eb1cb'.
#
# Changes to be committed:
# modified: README.md
#
Now if we run git lg -p
we’ll see the new squashed commit does indeed contain all the previous commit’s contents:
* b63857d - (HEAD -> feat/foo) Add B (16 minutes ago)|
| diff --git a/README.md b/README.md
| index 428f59e..f2e26b6 100644
| --- a/README.md
| +++ b/README.md
| @@ -1 +1,3 @@
| - A: 1
| +- B: 2
| +- C: 3
git rebase --onto
Imagine we’ve merged our feat/foo
branch at this point into master
using:
git merge --squash feat/foo
Note: you’ll need to fix a conflict first for it to be successful
So master
should now have three commits:
* 19ec1bb - (HEAD -> master) Merge feat/foo
* 3fc460b - A to 9
* 75eb1cb - (origin/master) README
What’s the easiest way to delete the middle/second commit 3fc460b
? We could use git rebase --interactive
to delete the commit from history, but there is an alternative that’s much easier:
git rebase --onto 75eb1cb 3fc460b
Note: in this scenario you’ll get a conflict that you’ll need to resolve first (e.g. we’re removing a commit that sets A to the value 9 but that change was also pulled into the
feat/foo
branch so git isn’t sure whether you definitely want that change any more or not), but in most cases you’ll likely have a clean rebase
The basic structure of this command is:
git rebase --onto <commit_to_become_new_base> <commit_to_delete>
For more information see the documentation for git rebase
:
man git-rebase
- git-scm.com/docs/git-rebase
git format-patch
At this point you’re likely using a service such as GitHub or GitLab for creating projects and opening pull requests, as apposed to Git’s own native pull request feature which is substantially less feature rich than these commercial abstraction layers.
But sometimes just accepting a ‘patch’ from someone and being able to apply it quickly and easily is what you want to do. So that’s where git format-patch
comes in.
Imagine you have a centralised master
branch and someone has branched off from its HEAD
to a new branch called cool-new-features
and they would like you to merge their changes directly with the centralised repository’s master
branch.
This person would need to execute the following command:
git format-patch master
Note: you can swap the branch
master
for any valid commit, alias or range
What this will end up doing is generating a ‘patch’ file for each new commit that isn’t available in master. Below is an example patch file generated from a test repo I was messing around with, and which actually generated two patch files for me (this being the first one):
From 64a903d2ed6b4280d4a0914aaf50f014ae05cdd3 Mon Sep 17 00:00:00 2001
From: Integralist <mark.mcdx@gmail.com>
Date: Tue, 31 May 2016 08:28:56 +0100
Subject: [PATCH 1/2] G
---
foo.txt | 1 +
1 file changed, 1 insertion(+)
diff --git a/foo.txt b/foo.txt
index b1e6722..6f04b1d 100644
--- a/foo.txt
+++ b/foo.txt
@@ -1,3 +1,4 @@
A
B
C
+G
--
2.7.4
Note: if you want a single patch file you can use
the--stdout
flag and redirect the output to a file
git format-patch master --stdout > new-feature.patch
The person who generates the patch file(s) will then need to send them to you (which can be done using git send-email
):
git send-email -to devlist@example.org 0001-A.patch
If it’s you sending the patch via git, then you may need to configure git to use your mail server details:
git config --global sendemail.smtpserver smtp.my-isp.com
git config --global sendemail.smtpserverport 465
You, as the recipient of the patch file(s), can then review and apply the patch using:
git checkout review-new-feature
cat new-feature.patch | git am # single patch file
cat *.patch | git am # multiple patch files
Also of interest, if using GitHub for Pull Requests, is that you can add a .patch
extension to the end of a PR path or commit path for it to generate a patch for you! So you can utilise GitHub for some of the nice ‘review’ features, but then utilise classic/traditional communication and application of patches if you so choose (maybe for an older/internal system).
So if you have a GitHub PR URL like https://github.com/my-org/my-repo/pull/123
, then you can convert this into a patch file using https://github.com/my-org/my-repo/pull/123.patch
Git also offers you the git apply
command to use in place of git am
. The reason being is that git am
actually commits the changes in the patch, whereas git apply
will only affect your working directory, so you’ll have the opportunity to stage and commit the changes however you like. Unless you use the --cached
or --index
flags (see man git-apply
for details).
Note:
git apply
also has a--reverse
flag to manipulate the order when applying multiple patchess
The other difference is that git am
only accepts patch files, whereas git apply
accepts patch files and also output from git diff
. So you have more options available to you that way. For example:
curl https://gist.githubusercontent.com/anonymous/x/raw/x/test.diff | git apply
Conclusion
There are so many aspects to merging commits and dealing with git’s commit history, that it’s difficult to cover everything without people having to mentally store too much information that most of the time you wont utilise.
For example, I’ve not covered anything to do with pulling commits: git pull --strategy
, git pull --squash
, git pull --rebase
, git pull --ff-only
and git pull --no-commit
. Each have their use cases, but I think sometimes you’re better picking a single strategy and defining it as a standard within your development team.
If you’re interested in one git workflow approach that utilises git’s rebasing feature, and I’ve used with success in the past at the BBC, then I recommend you have a read of this blog post I wrote a few years ago: integralist.co.uk/posts/github-workflow
I’ve also written about other types of git “workflows” as part of BBC News’ “Coding Best Practices” working group: github.com/bbc/news-coding-best-practices/git-workflow