git rebase

A Rebase Workflow for Git

Update October, 2014: Lots of things have gotten easier over the years.
These days, the easy way to fix this set of things is with the Pull Request workflow, which is essentially the Integration Manager workflow discussed here (probably).

  • Use github or bitbucket or somebody that makes the PR workflow easy
  • Delegate a person as integration manager, who will pull or comment on the PR
  • Require contributors to rebase their own PR branch before pulling if there are conflicts.

Update: Just for clarification, I'm not opposed to merges. I'm only opposed to unintentional merges (especially with a git pull). This followup article describes a simple way to rebase most of the time without even thinking about it). Also, for local development I love the git merge --squash method described by joachim below.

In this post I'm going to try to get you to adopt a specific rebase-based workflow, and to avoid (mostly) the merge workflow.

What is the Merge Workflow?

The merge workflow consists of:

git commit -m "something"
git pull   # this does a merge from origin and may add a merge commit
git push   # Push back both my commit and the (possible) merge commit

Note that you normally are forced to do the pull unless you're the only committer and you committed the last commit.

Why Don't I Want the Merge Workflow?

As we saw in Avoiding Git Disasters, the multiple-committer merge workflow has very specific perils due to the fact that every committer for a time has responsibility for what the other committers have committed.

These are the problems with the merge workflow:

  • It has the potential for disaster, as that merge and merge commit have to be handled correctly by every committer. That said, most committers will have no trouble with it and will not mess it up. But if you have lots of committers, and they don't all understand Git, or they are using a GUI that hides the actual results from them, watch out.
  • Your history becomes a mess. It has all kinds of inexplicable merge commits (which you typically don't look inside to see what's there) and the history (gitk) becomes useless.
  • Debugging using git bisect is confused massively due to the merge commits.

When Is the Merge Workflow OK?

The merge workflow will do you no damage at all if you

  • Only have one committer (or a very small number of committers, and you trust them all)

and

  • You don't care much about reading your history.

OK, What is Rebasing?

First, definitions:

  • A branch is a separate line of work. You may have seen these before in other VCS's, but in Git they're so easy to use that they're addictive and life-altering. You can expose branches in the public repository (a public branch) or they may never get off of your machine (a topical branch).
  • A public branch is one that more than one person pulls from. In Drupal, 7.x-1.x for most modules and themes would be a public branch.
  • A topical branch (or feature branch) is a private branch that you alone are using, and will not exposed in the public repository.
  • A tracking branch is a local branch that knows where its remote is, and that can push to and pull from that remote. Assuming a remote named "origin" and a public branch named "7.x-1.x", we could create a tracking branch with git branch --track 7.x-1.x origin/7.x-1.x, or with newer versions of git, git checkout --track origin/7.x-1.x

The fundamental idea of rebasing is that you make sure that your commits go on top of the "public" branch, that you "rebase" them so that instead of being related to some commit way back when you started working on this feature, they get reworked a little so they go on top of what's there now.

  1. Don't do your work on the public branch (Don't work on master or 6.x-1.x or whatever). Instead, work on a "topical" or "feature" branch, one that's devoted to what you want to do.
  2. When you're ready to commit something, you rebase onto the public branch, plopping your work onto the very tip of the public branch, as if it were a single patch you were applying.

Here's the approach. We'll assume that we already have a tracking branch 7.x-1.x for the public 7.x-1.x branch.

git checkout 7.x-1.x  # Check out the "public" branch
git pull              # Get the latest version from remote
git checkout -b comment_broken_links_101026  # topical branch
... # do stuff here.. Make commits.. test...
git fetch origin      # Update your repository's origin/ branches from remote repo
git rebase origin/7.x-1.x  # Plop our commits on top of everybody else's
git checkout 7.x-1.x  # Switch to the local tracking branch
git pull              # This won't result in a merge commit
git rebase comment_broken_links_101026  # Pull those commits over to the "public" branch
git push               # Push the public branch back up, with my stuff on the top

There are ways to simplify this, but I wanted to show it explicitly. The fundamental idea is that I as a developer am taking responsibility to make sure that my work goes right in on top of the everybody else's work. And that it "fits" there - that it doesn't require any magic or merge commits.

Using this technique, your work always goes on top of the public branch like a patch that is up-to-date with current HEAD. This is very much like the CVS patch workflow, and results in a clean history.

For extra credit, you can use git rebase -i and munge your commits into a single commit which has an excellent commit message, but I'm not going to go there today.

Merging and Merge Conflicts

Any time you do a rebase, you may have a merge conflict, in which Git doesn't know how to put your work on top of the work others have done. If you and others are working in different spaces and have your responsibilities well separated, this will happen rarely. But still, you have to know how to deal with it.

Every OS has good merge tools available which work beautifully with Git. Working from the command line you can use git mergetool when you have a conflict to resolve the conflict. We'll save that for another time.

Branch Cleanup

You can imagine that, using this workflow, you end up with all kinds of useless, abandoned topical branches. Yes you do. From time to time, clean them up with

git branch -d comment_broken_links_101026

or, if you haven't ever merged the topical branch (for example, if you just used it to prepare a patch)

git branch -D comment_broken_links_101026

Objections

If you read the help for git rebase it will tell you "Be careful. You shouldn't rewrite history that will be exposed publicly because everybody will hate you.". Note, though, that the way we're using rebase here, we only plop our commit(s) right on top, and then push. It does not change the public history. Of course there are other ways of using rebase that could change publicly-exposed history, and that is frowned upon.

Conclusion

This looks more complicated than the merge workflow. It is. It is not hard. It is valuable.

If you have improvements, suggestions, or alternate workflows to suggest, please post in the comments. If you find errors or things that can be stated more clearly or correctly, I'll fix the post.

I will follow up before long with a post on the "integration manager" workflow, which is essentially the github model. Everybody works in their own repositories, which are pseudo-private, and then when they have their work ready, they rebase it onto the public branch of the integration manager, push their work to the pseudo-private repo, and ask the integration manager to pull from it.

Resources

Drupal Patching, Committing, and Squashing with Git

Back in the bad old days (like 2 weeks ago) there was exactly one way to create patches and exactly one way to apply them. Now we have so many formats and so many ways to work it can be mind boggling. As a community, we're going to have to come to a consensus on which of these many options we prefer. I'm going to take a quick look at the options and how to work among them when necessary.

Ways to create patches

My preference when making changes is always to commit my changes whether or not the commits will be exposed later. That lets me keep track of what I'm doing, and make commit comments about my work. So in each if these cases we'll start with a feature branch based on origin/7.x-1.x with:

# Create a feature branch by branching from 7.x-1.x
git checkout -b my_feature_99999 origin/7.x-1.x
[edit A.txt]
git add .
git commit -m "Added A.txt"
[edit B.txt]
git add .
git commit -m "Added B.txt"

First, I may want to know what commits and changes I'm going to be including in this patch.

  • git log origin/7.x-1.x..HEAD shows me the commits that I've added.
  • git diff origin/7.x-1.x shows me the actual code changes I've done.

Now we can create a patch representing these changes in at least a couple of ways:

  • git diff origin/7.x-1.x >~/tmp/hypothetical.my_feature_99999_01.patch will create a patchfile which can be uploaded to Drupal.org.
  • git format-patch --stdout origin/7.x-1.x >~/tmp/hypothetical.my_feature_99999_01.patch will create a patchfile that includes sophisticated commit information allowing a maintainer to just apply the patch and fly - the commits you've created will automatically be applied (more later). Note that this type of patch has the email you used with git config user.email embedded in the patch, so if you post it on Drupal.org, it will be indexed by Google.

What do we do with the feature branch? Whatever. It's sitting there and can be used any number of ways in the future, or you can delete it. I tend to clean these up periodically, but not right away. git branch -D my_feature_99999 would delete this branch.

Ways to apply patches

I usually create a feature branch to apply and work with patches. This lets me make edits after the fact and have complete freedom. I don't have to keep track of what I've committed until I'm done. So in this case let's assume that I'm the maintainer and have received the patch created above.

Before we start, please
git config --global push.default tracking
which will allow a tracking branch to automatically push to its remote.

git checkout -b received_patch_99999/03 origin/7.x-1.x

This creates a named local branch which can push to the 7.x-1.x branch on the server.

There are at least three ways to apply patches:

  • patch -p1 < /path/to/patchfile.patch or patch -p1 -i /path/to/patchfile.patch will apply the differences. I typically commit the patch immediately, like:
      git status
      git add *.txt
      git commit -m "Patch from user99 to fix 99999 comment 3"
     

    I can then continue to work on this, but now I can differentiate my own edits from the original patch that was provided. I might made additional commits. As a maintainer, I can then rebase/squash and commit a nicely-formatted single commit at the end. (See below.)

  • git apply /path/to/patchfile.patch does exactly the same thing as patch -p1 so you can use the exact same workflow.
  • git am /path/to/patchfile.patch only works with patches created using git format-patch that contain commit information. But when you use it, it actually makes the exact commits (with the associated commit messages) that the original patcher made. You can then continue to make changes yourself, make additional commits, and then rebase/squash and commit a nicely formatted version of the patch (see below).

I could now push the commits with

git push  # If you have push.default set
OR
git push origin HEAD:7.x-1.x

Squashing, rebasing, and editing the commit message

Let's say that this patch works, we've committed whatever edits we want, and it's time to go ahead and fix it up and push it. Now we can rebase/squash it, fix up the commit message, and push the result.

Some things to know here: Although you may have heard that "rebasing is bad" it's not true. Rebasing commits that have already been made public and that might affect someone else is in fact bad. But here we have not done that. We are just using Git's greatest features to prepare a clean, single commit. It will not break anything, it will not rewrite anybody else's history.

Let's squash our all of our work into a single commit that has a good message: "Issue #99999 by user999: Added A.txt and B.txt"

# Make sure our local branch is up-to-date
git pull --rebase
# Now rebase against the branch we are working against.
git rebase -i origin/7.x-1.x

We'll have an editor session pop up with a rebase script specified, something like this:

    pick a5c1399 added A
    pick d3f45f7 Added B
  
    # Rebase c98c91a..d3f45f7 onto c98c91a
    #
    # Commands:
    #  p, pick = use commit
    #  r, reword = use commit, but edit the commit message
    #  e, edit = use commit, but stop for amending
    #  s, squash = use commit, but meld into previous commit
    #  f, fixup = like "squash", but discard this commit's log message
    #
    # If you remove a line here THAT COMMIT WILL BE LOST.
    # However, if you remove everything, the rebase will be aborted.
    #

To squash, all we have to do here is change the second and following commands of this little script from "pick" to "s" or "squash". Leave the top one as 'pick' and everything will be squashed into it. I'll change mine like this:

pick a5c1399 added A
s d3f45f7 Added B

and save the file and exit.

Then you get the chance to edit the commit message for the entire squashed commit. An editor session pops up with:

  # This is a combination of 2 commits.
  # The first commit's message is:
 
  added A
 
  # This is the 2nd commit message:
 
  Added B
 
  # Please enter the commit message for your changes. Lines starting
  # with '#' will be ignored, and an empty message aborts the commit.
  # Not currently on any branch.
  # Changes to be committed:
  #   (use "git reset HEAD <file>..." to unstage)
  #
  # new file:   A.txt
  # new file:   B.txt

At this point, everything in the commit message that starts with '#' will be a comment. You can just edit this file to have the commit that you want. I'll delete all the lines and put

Issue #99999 by user999: Added A.txt and B.txt

Now I can just

git push  # If push.default is set to tracking
OR
git push origin HEAD:7.x-1.x

Note: A maintainer who never gets lost in what they're doing and always finishes things sequentially doesn't absolutely need to create a new branch for all this. Instead of creating a branch with

git checkout -b received_patch_99999/03 origin/7.x-1.x

we could have done all this on a 7.x-1.x tracking branch. The only complexity is that we have to figure out what commit to rebase to, which we can figure out with git log

git checkout 7.x-1.x
git pull
git am /path/to/patchfile.patch
git rebase -i origin/7.x-1.x
git push

Rebasing to prepare a nicer single-commit patch

OK, so let's assume that the maintainers of this project ask for better-prepared patches because they don't care to rebase and never edit a commit, but rather just apply them. The patch contributor can do the exact same squashing process before creating the patch.

Above, when creating doing work to create a new patch, I did this:

  # Create a feature branch by branching from 7.x-1.x
  git checkout -b my_feature_99999 origin/7.x-1.x
  [edit A.txt]
  git add .
  git commit -m "Added A.txt"
  [edit B.txt]
  git add .
  git commit -m "Added B.txt"

There were two commits on my feature branch, and they have my typical, lousy, work-in-progress commit messages that I wouldn't want any maintainer to have to deal with. So I'll rebase/squash them into a single commit before creating my patch.

git rebase -i origin/7.x-1.x

Then I get the same exact options that the maintainer had in the section above, and follow the exact same procedure to consolidate them, and give a good commit message. Then,

git format-patch --stdout origin/7.x-1.x >~/tmp/hypothetical.my_feature_99999_01.patch

will make a very nice single-commit patch with a good message that the maintainers can use out of the box if they'd like to.

Summary

We have lots of options in creating and applying patches, but the git format-patch + git am + git rebase -i toolset is remarkably powerful, and we may be able to build a community consensus around this toolset.

Rebasing sounds hard and odd, and squashing really awful, but they're essentially the same thing as patching. In reality, they bring the patch workflow into the 21st gitury. One patch == one commit. Sure you can do lots of obscure things with them, but here we're just combining and cleaning up commits.

And yes, you've been warned not to rewrite publicly exposed history using rebase, but we're not doing that. We're preparing something to be made public so it's completely harmless.

Playing around with Git

Even if you've just arrived into the Gitworld, you've already noticed that things are really fast and flexible. Some people claim that the most important thing about the distributed nature of Git is that you can work on an airplane, but I claim that's totally bogus. The coolest thing about the distributed nature of Git is that you can fool around all you want, and try a million things, and even mess up your repo as much as you want... and still recover. Here's how, with a screencast at the bottom.

With CVS or Subversion or any number of other VCS's, when you commit or branch, you have the potential of causing yourself and others pain forever. And there's no way around it. You just have to do your best and hope it comes out OK. But with Git, you can try anything and rehearse it in advance, in a total no-consequence environment. Here are some suggestions of where to start.

Setting up a git-play area

First, let's pretend that we're a committer of Examples project. To do this, we don't have to have any privileges. Let's just make a copy of the Examples repo that we do have privileges on. We'll do this by creating a copy of the repository on Drupal.org in a throw-away directory:

cd /tmp
git clone --mirror git://git.drupal.org/project/examples.git
git clone examples.git --branch master
cd examples

Now we have a repo where we can do anything we want, experiment with anything, and it can never get back to the server, even though we have full push access. There is no way you can do anything wrong using this setup - you're able to commit, but you're committing to a local clone of Examples project.

Note that you could also just cp -r a local repo to /tmp or some similar junk play area. You just have to be careful in that case because if you had commit access in the original repo, you do also in the copy. The reason I did the mirror clone above was to give us commit access to a bogus local repository with no consequences.

Hard Reset

git reset --hard <commit>
Sometimes you just want to give up your work (or redo it, or just replay it from the authoritative repository. Then git reset --hard is what you want. You can throw away one or several commits, blow away changes you have staged, etc. If the commits in question are still on another branch (or on your branch in the remote repository), then you're not even doing anything destructive, but just fiddling with your local.

So first, let's experiment with what happens when we use the destructive git reset --hard, which sets the repository back to the commit we name. Let's set it back 3 commits:

git reset --hard HEAD~3
git log

We have just destroyed 3 commits! But did we do any damage? Nope.

git pull

pulls it right back into our local repository. All we actually did was to remote the memory of those 3 commits from our local branch.

Or let's say that those commits were not in the remote repo. We can still do all this with no risk:

git checkout master
git checkout -b play_branch
git reset --hard HEAD~3

# Recover the commits from our original branch
git merge master

Resetting a commit so we can fix it up a bit

Let's say that all these commits are my own and they haven't been released into the wild yet. I'm going to rework the top commit just a bit because I didn't really like it. I essentially throw away the commit, but keep the files in my work tree.

git reset HEAD^

will undo the top commit, but leave the results of it in the working tree.

Amending a commit

Sometimes I have either messed up the commit message, forgotten to stage a file or a file deletion, or something of the like. It's just good to have another chance at the commit.

I can stage some additional stuff I want to commit just by doing a git add and then

git commit --amend

and the newly staged stuff gets added to the top commit, and I have the chance to change the commit message as well.

Yes, this is rewriting history, and it must be done only on commits that have not already been released into the wild.

Combining (squashing) 10 commits into one

Since we're playing let's combine 10 commits into one. This uses the rather exotic "rebase" command to do the exotic "squash" operation. But even though these sound forbidding, it's just a powerful way to combine many commits into one:

git rebase -i HEAD~10

We get the chance to turn the last 10 commits into any number of commits, or squash them into 1. To turn it all into one commit, change lines 2-10 from "pick" into an "s" (for "squash").

Cherry-picking

Let's make a new branch, go back in history by 10 commits, and then cherry pick some of the original commits that were on this branch back onto it. It's easy.

git log    # Take a look at the commits
git checkout -b my_fiddle_branch  # Make a play branch
git reset --hard HEAD~10  # Kill off the top 10 commits
git cherry-pick master^  # Take the next-to-top commit on master and apply it on this branch

Playing Around with Git from Randy Fay on Vimeo.

Simpler Rebasing (avoiding unintentional merge commits)

Executive summary: Set up to automatically use a git pull --rebase

Please just do this if you do nothing else:

git config --global pull.rebase true

About rebasing and pulling

Edit 2021-06-17: Updated the suggested command from the old git config --global branch.autosetuprebase always to the current (from a long time) git config --global pull.rebase true

I've written a couple of articles on rebasing and why to do it, but it does seem that the approaches can be too complex for some workflows. So I'd like to propose a simpler rebase approach and reiterate why rebasing is important in a number of situations.

Here's a simple way to avoid evil merge commits but not do the fancier topic branch approaches:

  • Go ahead and work on the branch you commit on (say 7.x-1.x)
  • Make sure that when you pull you do it with git pull --rebase
  • Push when you need to.

That's it.

Reasons to rebase to keep your history clean

There are two major reasons not to go with the default "git pull" merge technique.

  1. Unintentional merge commits are evil: As described in the Git disasters article, doing the default git pull with a merge can result in merge commits that hide information, present cryptic merge commits like "Merge branch '7.x' of /home/rfay/workspace/d7git into 7.x" which explain nothing at all, and may contain toxic changes that you didn't intend. There is nothing wrong with merges or merge commits if you intend to present a merge. But having garbage merges every time you do a pull is really a mess, as described in the article above.
  2. Intentional merge commits are OK if you really want that branch history there: If a significant, long-term piece of work has gone on and should be shown as a branch in the future history of the project, then go ahead and merge it before you commit, using git merge --no-ff to show a specific intentional merge. However, if the work you're committing is really a single piece of work (the equivalent of a Drupal.org patch), then why should it show up in the long-term history of the project as a branch and a merge?

I'll write again about merging and rebasing workflows, but for now we're just going to deal with #1: The case where you share a branch with others and you want to pull in their work without generating unintentional merge commits.

How to use git pull --rebase

The first and most important thing if you're a committer working on a branch shared with other committers is to never use the default git pull. Use the rebase version, git pull --rebase. That takes your commits that are not on the remote version of your branch and reworks/rebases them so that they're ahead of (on top of) the new commits you pull in with your pull.

Automatically using git pull --rebase

It's very easy to set up so that you don't ever accidentally use the merge-based pull. To configure a single branch this way:

git config branch.7.x.rebase true

To set it up so every branch you ever create on any repository is set to pull with rebase:

git config --global pull.rebase true

That should do the trick. Using this technique, no matter whether you use techniques to keep your history linear or not, whether you use topic branches or not, whether you squash or not, you won't end up with git disasters.

Bottom line: No matter what you do, please use git pull --rebase. To do that automatically forever without thinking,

git config --global pull.rebase true
Subscribe to git rebase