git

Debugging with Git Bisect

Submitted by rfay on Fri, 2010-04-02 18:58

As most of you know, the marvelous git version control system is the future of the Drupal repository. You've probably heard people rave about lots of very important features: It's distributed, FAST, branching and merging are easy.

But today I'm here to rave about a more obscure wonderful feature of git: the git bisect command. With this command you can find out where in the history a particular bug was introduced. I made a short screencast to explain it:

The basic idea: The fastest way to understand a problem is to divide it in half, then divide that in half, etc. In search technology this is called binary search. In military terms it's called "divide and conquer". It's great. It's git bisect. You choose a version of your code that's broken. Then you choose a historical revision that's not broken. Then git just checks out for you the halfway points for you to test until you get to the actual answer to your question: in what commit was the change introduced?

One key feature of git that this showcases is its incredible speed. You could do all this (manually) with any revision control system. But how long would it take to do each checkout? Way too long to make it practical in most cases.

Git is cool. This is just one more reason.

Git over an ssh tunnel (like through a firewall or VPN)

Submitted by rfay on Mon, 2010-10-11 18:41

It's a treasured geek secret that ssh can tunnel TCP connections like ssh all over the internet. What does that mean? It means that you can access machines and ports from your local machine that you never thought you could, including git repositories that are behind firewalls or inside VPNs.

There are two steps to this. First, we have to set up an ssh tunnel. We have 3 machines we're interacting with here:

The local machine where we want to be able to do a git clone or git pull or whatever. I'll call it localhost.
The internet or VPN host that has access to your git repository. Let's call it proxy.example.com.
The host that has the git repository on it. We'll call it git.example.com, and we'll assume that access to git is via ssh on port 22, which is very common for git repos with commit access.

Step 1: Set up a tunnel (in one window). ssh -L3333:git.example.com:22 you@proxy.example.com

This ssh's you into proxy.example.com, but in the process sets up a TCP tunnel between your localhost port 3333 through the proxy internet host and to port 22 on git.example.com. (You can use any convenient port; 3333 is just an example.)

Note that we have to have permission to do this on the proxy.example.com server; the default is for it to be on, but it might not be. The relevant permission is PermitTunnel and it must be allowed (or omitted) in /etc/ssh/sshd_config (debian). But now we'll test to see whether it's working or not:

telnet localhost 3333

If you get an answer like SSH-2.0-OpenSSH_5.1p1 Debian-5 you are in business.
If "telnet" is not found, then install it with apt-get install telnet or yum install telnet or whatever for your distro.

Step 2: Use git with an ssh URL to connect through the tunnel to your behind-the-VPN git repo:
git clone ssh://git@localhost:3333/example.git

"git@" is the user you use to connect to the git repo; it might be something else, of course. And your project might be in a subdirectory, etc. So it might be git clone ssh://rfay@localhost:3333/project/someproject.git.

You should have a repo cloned.

Remember that you have to have the tunnel up in the future to do a pull or fetch or similar operation. You may want to look into the excellent autossh package that automatically establishes ssh connections and keeps them up.

There are lots of resources on ssh tunneling including this simple one. You can also use ssh -R to put a port on the machine you ssh into that will access your own local machine.

Shhhhh.. Don't tell anybody!

Avoiding Git Disasters: A Gory Story

Submitted by rfay on Sun, 2011-01-30 12:10

Edit 2015-08-30: The bottom line years later: Use the (Github's) Pull Request methodology, with a responsible person doing the pulls. You'll never have any of these problems.

I learned the hard way recently that there are some unexpectedly horrible things that can happen to a project in the Git source control management system due to its distributed nature... that I never would have thought of.

There is one huge difference between Git and older server-based systems like Subversion and CVS. That difference is that there's no server. There's (usually) an authoritative repository, but it's really fundamentally just a peer repository that gets stuff sent to it. OK, we all knew that. But that has some implications that aren't obvious at first. In Subversion, when you make a change, you just push that change up to the server, and the server handles applying just that change to the master copy of the project. However, in Git, and especially when using the default "merge workflow" (I'll write about merge workflow versus rebase workflow in another article), there are times when a single developer may be in charge of (and able to unintentionally break) the entire codebase all at once. So here I'm going to describe two ways that I know of that this can happen.

Disaster 1: git push --force

A normal push to the authoritative repository involves taking your new work as new commits and plopping those commits as-is on top of the branch in the repository. However, when a developer's local Git repository is not in sync with (or up-to-date with) the authoritative repository (the one we normally push to), then it can't do a fast-forward merge, and it will balk with an error message.

The right thing to do in this case is to either merge your code with a git pull or to rebase your code onto the HEAD with git pull --rebase, or to use any number of other similar techniques. The absolutely worst and wrong-est thing in the whole world is something that you can do with the default configuration: git push --force. A forced push overwrites the structure and sequence of commits on the authoritative repository, throwing away other people's commits. Yuck.

The default configuration in git, that git push --force is allowed. In most cases you should not ever allow that.

How do you prevent git push --force? (thanks to sdboyer!)

In the bare authoritative repository,

git config --system receive.denyNonFastForwards true

Disaster 2: Merging Without Understanding

This one is far more insidious. You can't just turn off a switch and prevent it, and if you use the merge workflow you're highly susceptible.

So let's say that your developers can't do the git push --force or would never consider doing so. But maybe there are 10 developers working hot and heavy on a project using the merge workflow.

In the merge workflow, everybody does work in their own repository, and then when it comes time to push, they do a git pull (which by default tries to merge into their code everything that's been one on the repository) and then they do a git push to push their work back up to the repo. But in the git pull all the work that has been done is merged on the developer's machine. And the results of that merge are then pushed back up as a potentially huge new commit.

The problem can come in that merge phase, which can be a big merge, merging in lots of commits. If the developer does not push back a good merge, or alters the merge in some way, then pushes it back, then the altered world that they push back becomes everybody else's HEAD. Yuck.

Here's the actual scenario that caused an enormous amount of hair pulling.

The team was using the merge workflow. Lots of people changing things really fast. The typical style was
- Work on your stuff
- Commit it locally
- git pull and hope for no conflicts
- git push as fast as you can before somebody else gets in there
Many of the team members were using Tortoise Git, which works fine, but they had migrated from Tortoise SVN without understanding the underlying differences between Git and Subversion.
Merge conflicts happened fairly often because so many people were doing so many things
One user of Tortoise Git would do a pull, have a merge conflict, resolve the merge conflict, and then look carefully at his list of files to be committed back when he was committing the results. There were lots of files there, and he knew that the merge conflict only involved a couple of files. For his commit, he unchecked all the other files changes that he was not involved in, committed the results and pushed the commit.
The result: All the commits by other people that had been done between this user's previous commit and this one were discarded

Oh, that is a very painful story.

How do you avoid this problem when using git?

Train your users. And when you train them make sure they understand the fundamental differences between Git and SVN or CVS.
Don't use the merge workflow. That doesn't solve every possible problem, but it does help because then merging is at the "merging my changes" level instead of the "merging the whole project" level. Again, I'll write another blog post about the rebase workflow.

Alternatives to the Merge Workflow

I know of two alternatives. The first is to rebase commits (locally) so you put your commits as clean commits on top of HEAD, on top of what other people have been doing, resulting in a fast-forward merge, which doesn't have all the merging going on.

The second alternative is promoted or assumed by Github and used widely by the Linux Core project (where Git came from). In that scenario, you don't let more than one maintainer push to the important branches on the authoritative repository. Users can clone the authoritative repository, but when they have changes to be made they request that the maintainer pull their changes from the contributor's own repository. This is called a "pull request". The end result is that you have one person controlling what goes into the repository. That one person can require correct merging behavior from contributors, or can sort it out herself. If a contribution comes in on a pull request that isn't rebased on top of head as a single commit, the maintainer can clean it up before committing it.

Conclusions

Avoid the merge workflow, especially if you have many committers or you have less-trained committers.

Understand how the distributed nature of git changes the game.

Turn on system receive.denyNonFastForwards on your authoritative repository

Many of you have far more experience with Git than I do, so I hope you'll chime in to express your opinions about solving these problems.

Many thanks and huge kudos to Marco Villegas (marvil07), the Git wizard who studied and helped me to understand what was going on in the Tortoise Git disaster. And thanks to our Drupal community Git migration wizard Sam Boyer (sdboyer) who listened with Marco to a number of pained explanations of the whole thing and also contributed to its solution.

Oh, did I mention I'm a huge fan of Git? Distributed development and topical branches have changed how I think about development. You could say it's changed my life. I love it. We just all have to understand the differences and deal with them realistically.

A Rebase Workflow for Git

Submitted by rfay on Sun, 2011-02-06 13:00

Update October, 2014: Lots of things have gotten easier over the years.
These days, the easy way to fix this set of things is with the Pull Request workflow, which is essentially the Integration Manager workflow discussed here (probably).

Use github or bitbucket or somebody that makes the PR workflow easy
Delegate a person as integration manager, who will pull or comment on the PR
Require contributors to rebase their own PR branch before pulling if there are conflicts.

Update: Just for clarification, I'm not opposed to merges. I'm only opposed to unintentional merges (especially with a git pull). This followup article describes a simple way to rebase most of the time without even thinking about it). Also, for local development I love the git merge --squash method described by joachim below.

In this post I'm going to try to get you to adopt a specific rebase-based workflow, and to avoid (mostly) the merge workflow.

What is the Merge Workflow?

The merge workflow consists of:

git commit -m "something"
git pull   # this does a merge from origin and may add a merge commit
git push   # Push back both my commit and the (possible) merge commit

Note that you normally are forced to do the pull unless you're the only committer and you committed the last commit.

Why Don't I Want the Merge Workflow?

As we saw in Avoiding Git Disasters, the multiple-committer merge workflow has very specific perils due to the fact that every committer for a time has responsibility for what the other committers have committed.

These are the problems with the merge workflow:

It has the potential for disaster, as that merge and merge commit have to be handled correctly by every committer. That said, most committers will have no trouble with it and will not mess it up. But if you have lots of committers, and they don't all understand Git, or they are using a GUI that hides the actual results from them, watch out.
Your history becomes a mess. It has all kinds of inexplicable merge commits (which you typically don't look inside to see what's there) and the history (gitk) becomes useless.
Debugging using git bisect is confused massively due to the merge commits.

When Is the Merge Workflow OK?

The merge workflow will do you no damage at all if you

Only have one committer (or a very small number of committers, and you trust them all)

and

You don't care much about reading your history.

OK, What is Rebasing?

First, definitions:

A branch is a separate line of work. You may have seen these before in other VCS's, but in Git they're so easy to use that they're addictive and life-altering. You can expose branches in the public repository (a public branch) or they may never get off of your machine (a topical branch).
A public branch is one that more than one person pulls from. In Drupal, 7.x-1.x for most modules and themes would be a public branch.
A topical branch (or feature branch) is a private branch that you alone are using, and will not exposed in the public repository.
A tracking branch is a local branch that knows where its remote is, and that can push to and pull from that remote. Assuming a remote named "origin" and a public branch named "7.x-1.x", we could create a tracking branch with git branch --track 7.x-1.x origin/7.x-1.x, or with newer versions of git, git checkout --track origin/7.x-1.x

The fundamental idea of rebasing is that you make sure that your commits go on top of the "public" branch, that you "rebase" them so that instead of being related to some commit way back when you started working on this feature, they get reworked a little so they go on top of what's there now.

Don't do your work on the public branch (Don't work on master or 6.x-1.x or whatever). Instead, work on a "topical" or "feature" branch, one that's devoted to what you want to do.
When you're ready to commit something, you rebase onto the public branch, plopping your work onto the very tip of the public branch, as if it were a single patch you were applying.

Here's the approach. We'll assume that we already have a tracking branch 7.x-1.x for the public 7.x-1.x branch.

git checkout 7.x-1.x  # Check out the "public" branch 
git pull              # Get the latest version from remote
git checkout -b comment_broken_links_101026  # topical branch
... # do stuff here.. Make commits.. test...
git fetch origin      # Update your repository's origin/ branches from remote repo
git rebase origin/7.x-1.x  # Plop our commits on top of everybody else's
git checkout 7.x-1.x  # Switch to the local tracking branch
git pull              # This won't result in a merge commit
git rebase comment_broken_links_101026  # Pull those commits over to the "public" branch
git push               # Push the public branch back up, with my stuff on the top

There are ways to simplify this, but I wanted to show it explicitly. The fundamental idea is that I as a developer am taking responsibility to make sure that my work goes right in on top of the everybody else's work. And that it "fits" there - that it doesn't require any magic or merge commits.

Using this technique, your work always goes on top of the public branch like a patch that is up-to-date with current HEAD. This is very much like the CVS patch workflow, and results in a clean history.

For extra credit, you can use git rebase -i and munge your commits into a single commit which has an excellent commit message, but I'm not going to go there today.

Merging and Merge Conflicts

Any time you do a rebase, you may have a merge conflict, in which Git doesn't know how to put your work on top of the work others have done. If you and others are working in different spaces and have your responsibilities well separated, this will happen rarely. But still, you have to know how to deal with it.

Every OS has good merge tools available which work beautifully with Git. Working from the command line you can use git mergetool when you have a conflict to resolve the conflict. We'll save that for another time.

Branch Cleanup

You can imagine that, using this workflow, you end up with all kinds of useless, abandoned topical branches. Yes you do. From time to time, clean them up with

git branch -d comment_broken_links_101026

or, if you haven't ever merged the topical branch (for example, if you just used it to prepare a patch)

git branch -D comment_broken_links_101026

Objections

If you read the help for git rebase it will tell you "Be careful. You shouldn't rewrite history that will be exposed publicly because everybody will hate you.". Note, though, that the way we're using rebase here, we only plop our commit(s) right on top, and then push. It does not change the public history. Of course there are other ways of using rebase that could change publicly-exposed history, and that is frowned upon.

Conclusion

This looks more complicated than the merge workflow. It is. It is not hard. It is valuable.

If you have improvements, suggestions, or alternate workflows to suggest, please post in the comments. If you find errors or things that can be stated more clearly or correctly, I'll fix the post.

I will follow up before long with a post on the "integration manager" workflow, which is essentially the github model. Everybody works in their own repositories, which are pseudo-private, and then when they have their work ready, they rebase it onto the public branch of the integration manager, push their work to the pseudo-private repo, and ask the integration manager to pull from it.

Resources

Reference (cache) repositories to speed up clones: git clone --reference

Submitted by rfay on Fri, 2011-02-18 12:08

Update: Please read this update on my experience before using the technique in this article.
Update: See the drush implementation of this approach in the comment below.

DamZ taught me a great new piece of git trivia today. You can use a local repository as a kind of cache for a git clone.

Let's create a reference repository for Drupal (and it will be bare, because we don't need any files checked out)

git clone --mirror git://git.drupal.org/project/drupal.git ~/gitcaches/drupal.reference

That makes a complete clone of Drupal's full history in ~/gitcaches/drupal.reference

Now when I need to clone the Drupal project's entire history (as I might do often in testing) I can

git clone --reference ~/gitcaches/drupal.reference git://git.drupal.org/project/drupal.git 

And the clone time is on the order of 2 seconds instead of several minutes. And yes, it picks up new changes that may have happened in the real remote repository.

To go beyond this (again from DamZ) we can have a reference repository that has many projects referenced within it.

mkdir -p ~/gitcaches/reference
cd ~/gitcaches/reference
git init --bare
for repo in drupal views cck examples panels  # whatever you want here
do
  git remote add $repo git://git.drupal.org/project/$repo.git
done
git fetch --all

Now I have just one big bare repo that I can use as a cache. I might update it from time to time with git fetch --all. But I don't have to. And I can use it like this:

cd /tmp
git clone --reference ~/gitcaches/reference git://git.drupal.org/project/drupal.git
git clone --reference ~/gitcaches/reference git://git.drupal.org/project/examples.git

We'll try to use this technique for the testbots, which do several clean checkouts per patch tested, as it should speed them up by at least a minute per test.

Edit: Here is the version that I used with the testbots, as it appears as a gist:

Pushing and Deleting Git Topic Branches (Screencast)

Submitted by rfay on Sun, 2011-02-27 20:23

Creating topic branches (also called "feature branches") is easy in Git and
is one of the best things about Git. Sometimes we also want to get those pushed
from our local repo for various reasons:

To make sure it's safe on another server (for backup).
To let others review it.
To let others build upon it.

Here we'll just be dealing with #1 and #2, and not talking about how to collaborate
on a shared branch. We'll be assuming that only the original author will push to it,
and that nobody else will be doing work based on that branch.

Creating a feature branch (using name/name_issue)

git checkout -b rfay/some_feature_39949494/05 OR git checkout -b rfay/some_feature_39949494
Pushing it up (with origin xxx)

git push origin rfay/some_feature_39949494
Turning a regular branch into a tracking branch (if you like shorter commands)

git config --global push.default tracking # One time setup. git branch --set-upstream foo upstream/foo
OR just create a new tracking branch and forget all that.

git branch -m [my_current_branch] junk git checkout --track origin/[branch]
OR if you have commit access to the upstream branch

git push -u upstream foo git push -u origin foo
When you need to, delete the branch
- Locally
  
  git branch -d [branch] # If it's been merged git branch -D [branch] # unconditionally
- Remote
  
  git push origin :[branch] # pretty odd syntax

Drupal Patching, Committing, and Squashing with Git

Submitted by rfay on Sat, 2011-03-05 20:31

Back in the bad old days (like 2 weeks ago) there was exactly one way to create patches and exactly one way to apply them. Now we have so many formats and so many ways to work it can be mind boggling. As a community, we're going to have to come to a consensus on which of these many options we prefer. I'm going to take a quick look at the options and how to work among them when necessary.

Ways to create patches

My preference when making changes is always to commit my changes whether or not the commits will be exposed later. That lets me keep track of what I'm doing, and make commit comments about my work. So in each if these cases we'll start with a feature branch based on origin/7.x-1.x with:

# Create a feature branch by branching from 7.x-1.x
git checkout -b my_feature_99999 origin/7.x-1.x
[edit A.txt]
git add .
git commit -m "Added A.txt"
[edit B.txt]
git add .
git commit -m "Added B.txt"

First, I may want to know what commits and changes I'm going to be including in this patch.

git log origin/7.x-1.x..HEAD shows me the commits that I've added.
git diff origin/7.x-1.x shows me the actual code changes I've done.

Now we can create a patch representing these changes in at least a couple of ways:

git diff origin/7.x-1.x >~/tmp/hypothetical.my_feature_99999_01.patch will create a patchfile which can be uploaded to Drupal.org.
git format-patch --stdout origin/7.x-1.x >~/tmp/hypothetical.my_feature_99999_01.patch will create a patchfile that includes sophisticated commit information allowing a maintainer to just apply the patch and fly - the commits you've created will automatically be applied (more later). Note that this type of patch has the email you used with git config user.email embedded in the patch, so if you post it on Drupal.org, it will be indexed by Google.

What do we do with the feature branch? Whatever. It's sitting there and can be used any number of ways in the future, or you can delete it. I tend to clean these up periodically, but not right away. git branch -D my_feature_99999 would delete this branch.

Ways to apply patches

I usually create a feature branch to apply and work with patches. This lets me make edits after the fact and have complete freedom. I don't have to keep track of what I've committed until I'm done. So in this case let's assume that I'm the maintainer and have received the patch created above.

Before we start, please
git config --global push.default tracking
which will allow a tracking branch to automatically push to its remote.

git checkout -b received_patch_99999/03 origin/7.x-1.x

This creates a named local branch which can push to the 7.x-1.x branch on the server.

There are at least three ways to apply patches:

patch -p1 < /path/to/patchfile.patch or patch -p1 -i /path/to/patchfile.patch will apply the differences. I typically commit the patch immediately, like:

git status git add *.txt git commit -m "Patch from user99 to fix 99999 comment 3"

I can then continue to work on this, but now I can differentiate my own edits from the original patch that was provided. I might made additional commits. As a maintainer, I can then rebase/squash and commit a nicely-formatted single commit at the end. (See below.)
git apply /path/to/patchfile.patch does exactly the same thing as patch -p1 so you can use the exact same workflow.
git am /path/to/patchfile.patch only works with patches created using git format-patch that contain commit information. But when you use it, it actually makes the exact commits (with the associated commit messages) that the original patcher made. You can then continue to make changes yourself, make additional commits, and then rebase/squash and commit a nicely formatted version of the patch (see below).

I could now push the commits with

git push  # If you have push.default set
OR
git push origin HEAD:7.x-1.x

Squashing, rebasing, and editing the commit message

Let's say that this patch works, we've committed whatever edits we want, and it's time to go ahead and fix it up and push it. Now we can rebase/squash it, fix up the commit message, and push the result.

Some things to know here: Although you may have heard that "rebasing is bad" it's not true. Rebasing commits that have already been made public and that might affect someone else is in fact bad. But here we have not done that. We are just using Git's greatest features to prepare a clean, single commit. It will not break anything, it will not rewrite anybody else's history.

Let's squash our all of our work into a single commit that has a good message: "Issue #99999 by user999: Added A.txt and B.txt"

# Make sure our local branch is up-to-date
git pull --rebase
# Now rebase against the branch we are working against.
git rebase -i origin/7.x-1.x

We'll have an editor session pop up with a rebase script specified, something like this:

    pick a5c1399 added A
    pick d3f45f7 Added B
   
    # Rebase c98c91a..d3f45f7 onto c98c91a
    #
    # Commands:
    #  p, pick = use commit
    #  r, reword = use commit, but edit the commit message
    #  e, edit = use commit, but stop for amending
    #  s, squash = use commit, but meld into previous commit
    #  f, fixup = like "squash", but discard this commit's log message
    #
    # If you remove a line here THAT COMMIT WILL BE LOST.
    # However, if you remove everything, the rebase will be aborted.
    #

To squash, all we have to do here is change the second and following commands of this little script from "pick" to "s" or "squash". Leave the top one as 'pick' and everything will be squashed into it. I'll change mine like this:

pick a5c1399 added A
s d3f45f7 Added B

and save the file and exit.

Then you get the chance to edit the commit message for the entire squashed commit. An editor session pops up with:

  # This is a combination of 2 commits.
  # The first commit's message is:
  
  added A
  
  # This is the 2nd commit message:
  
  Added B
  
  # Please enter the commit message for your changes. Lines starting
  # with '#' will be ignored, and an empty message aborts the commit.
  # Not currently on any branch.
  # Changes to be committed:
  #   (use "git reset HEAD <file>..." to unstage)
  #
  # new file:   A.txt
  # new file:   B.txt

At this point, everything in the commit message that starts with '#' will be a comment. You can just edit this file to have the commit that you want. I'll delete all the lines and put

Issue #99999 by user999: Added A.txt and B.txt

Now I can just

git push  # If push.default is set to tracking
OR
git push origin HEAD:7.x-1.x

Note: A maintainer who never gets lost in what they're doing and always finishes things sequentially doesn't absolutely need to create a new branch for all this. Instead of creating a branch with

git checkout -b received_patch_99999/03 origin/7.x-1.x

we could have done all this on a 7.x-1.x tracking branch. The only complexity is that we have to figure out what commit to rebase to, which we can figure out with git log

git checkout 7.x-1.x
git pull
git am /path/to/patchfile.patch
git rebase -i origin/7.x-1.x
git push

Rebasing to prepare a nicer single-commit patch

OK, so let's assume that the maintainers of this project ask for better-prepared patches because they don't care to rebase and never edit a commit, but rather just apply them. The patch contributor can do the exact same squashing process before creating the patch.

Above, when creating doing work to create a new patch, I did this:

  # Create a feature branch by branching from 7.x-1.x
  git checkout -b my_feature_99999 origin/7.x-1.x
  [edit A.txt]
  git add .
  git commit -m "Added A.txt"
  [edit B.txt]
  git add .
  git commit -m "Added B.txt"

There were two commits on my feature branch, and they have my typical, lousy, work-in-progress commit messages that I wouldn't want any maintainer to have to deal with. So I'll rebase/squash them into a single commit before creating my patch.

git rebase -i origin/7.x-1.x

Then I get the same exact options that the maintainer had in the section above, and follow the exact same procedure to consolidate them, and give a good commit message. Then,

git format-patch --stdout origin/7.x-1.x >~/tmp/hypothetical.my_feature_99999_01.patch

will make a very nice single-commit patch with a good message that the maintainers can use out of the box if they'd like to.

Summary

We have lots of options in creating and applying patches, but the git format-patch + git am + git rebase -i toolset is remarkably powerful, and we may be able to build a community consensus around this toolset.

Rebasing sounds hard and odd, and squashing really awful, but they're essentially the same thing as patching. In reality, they bring the patch workflow into the 21st gitury. One patch == one commit. Sure you can do lots of obscure things with them, but here we're just combining and cleaning up commits.

And yes, you've been warned not to rewrite publicly exposed history using rebase, but we're not doing that. We're preparing something to be made public so it's completely harmless.

Playing around with Git

Submitted by rfay on Mon, 2011-03-28 11:21

Even if you've just arrived into the Gitworld, you've already noticed that things are really fast and flexible. Some people claim that the most important thing about the distributed nature of Git is that you can work on an airplane, but I claim that's totally bogus. The coolest thing about the distributed nature of Git is that you can fool around all you want, and try a million things, and even mess up your repo as much as you want... and still recover. Here's how, with a screencast at the bottom.

With CVS or Subversion or any number of other VCS's, when you commit or branch, you have the potential of causing yourself and others pain forever. And there's no way around it. You just have to do your best and hope it comes out OK. But with Git, you can try anything and rehearse it in advance, in a total no-consequence environment. Here are some suggestions of where to start.

Setting up a git-play area

First, let's pretend that we're a committer of Examples project. To do this, we don't have to have any privileges. Let's just make a copy of the Examples repo that we do have privileges on. We'll do this by creating a copy of the repository on Drupal.org in a throw-away directory:

cd /tmp
git clone --mirror git://git.drupal.org/project/examples.git
git clone examples.git --branch master
cd examples

Now we have a repo where we can do anything we want, experiment with anything, and it can never get back to the server, even though we have full push access. There is no way you can do anything wrong using this setup - you're able to commit, but you're committing to a local clone of Examples project.

Note that you could also just cp -r a local repo to /tmp or some similar junk play area. You just have to be careful in that case because if you had commit access in the original repo, you do also in the copy. The reason I did the mirror clone above was to give us commit access to a bogus local repository with no consequences.

Hard Reset

git reset --hard <commit>
Sometimes you just want to give up your work (or redo it, or just replay it from the authoritative repository. Then git reset --hard is what you want. You can throw away one or several commits, blow away changes you have staged, etc. If the commits in question are still on another branch (or on your branch in the remote repository), then you're not even doing anything destructive, but just fiddling with your local.

So first, let's experiment with what happens when we use the destructive git reset --hard, which sets the repository back to the commit we name. Let's set it back 3 commits:

git reset --hard HEAD~3
git log

We have just destroyed 3 commits! But did we do any damage? Nope.

git pull

pulls it right back into our local repository. All we actually did was to remote the memory of those 3 commits from our local branch.

Or let's say that those commits were not in the remote repo. We can still do all this with no risk:

git checkout master
git checkout -b play_branch
git reset --hard HEAD~3
# Recover the commits from our original branch
git merge master

Resetting a commit so we can fix it up a bit

Let's say that all these commits are my own and they haven't been released into the wild yet. I'm going to rework the top commit just a bit because I didn't really like it. I essentially throw away the commit, but keep the files in my work tree.

git reset HEAD^

will undo the top commit, but leave the results of it in the working tree.

Amending a commit

Sometimes I have either messed up the commit message, forgotten to stage a file or a file deletion, or something of the like. It's just good to have another chance at the commit.

I can stage some additional stuff I want to commit just by doing a git add and then

git commit --amend

and the newly staged stuff gets added to the top commit, and I have the chance to change the commit message as well.

Yes, this is rewriting history, and it must be done only on commits that have not already been released into the wild.

Combining (squashing) 10 commits into one

Since we're playing let's combine 10 commits into one. This uses the rather exotic "rebase" command to do the exotic "squash" operation. But even though these sound forbidding, it's just a powerful way to combine many commits into one:

git rebase -i HEAD~10

We get the chance to turn the last 10 commits into any number of commits, or squash them into 1. To turn it all into one commit, change lines 2-10 from "pick" into an "s" (for "squash").

Cherry-picking

Let's make a new branch, go back in history by 10 commits, and then cherry pick some of the original commits that were on this branch back onto it. It's easy.

git log    # Take a look at the commits
git checkout -b my_fiddle_branch  # Make a play branch
git reset --hard HEAD~10  # Kill off the top 10 commits
git cherry-pick master^  # Take the next-to-top commit on master and apply it on this branch

Playing Around with Git from Randy Fay on Vimeo.

Handling the new .gitignore file in D7 and D8

Submitted by rfay on Tue, 2011-05-17 14:46

Drupal 7 and Drupal 8 recently added a default (and sensible) .gitignore file to the standard repository, and while this solves some problems, it has also caused some confusion. (issue)

Here's a link to the actual new .gitignore. Essentially, it excludes the sites/default/files and sites/default/settings.php files from git source control.

What problems does having a default .gitignore solve?

The biggest problem it solves is that patches submitted to drupal.org were accidentally including things that they never should have included (like people's settings.php files) We just don't need that information, thank you very much.
It also sets a "best practice" for not source-controlling your files directory. Since the files directory is website-generated or user-generated content, it doesn't make any sense to put that in your git repository; most people have long come to a consensus on this, although not all agree.

What problems does having a default .gitignore create?

Mostly the problems created have to do with developer workflow.

A dev site may contain lots of deliberately uncontrolled modules or themes or libraries.
A site may want a completely different .gitignore due to various policy differences from the default.

How do I solve these problems?

Lots of ways:

If you don't want sites/all to be controlled at all (you want to ignore all modules and themes and libraries), add a file at sites/all/.gitignore with the contents a single line containing nothing but *
Simply change the .gitignore and commit the change. You won't be pushing it up to 7.x right?
If you track core code using downloads (and not git) you can simply change the .gitignore and check it into your own VCS.
Add extra things into .git/info/exclude. This basically works like .gitignore (it has good examples in it) and is not under source control at all.
Add an additional master gitignore capability with git config core.excludesfile .gitignore.custom and then put additional exclusions in the .gitignore.custom file.

Note that only 1 and 2 are completely source-controlled. In other words, #3, 4, and 5 would have a slight bit of configuration on a deployment site to work correctly, but they work perfectly for a random dev site.

How do I exclude things from Git that should always be excluded (Eclipse .project files, Netbeans nbproject directories, .orig files, etc.)?

You can have a global kill file. I use ~/.gitignore for things that I don't ever want to see show up as untracked files. You can activate that with git config --global core.excludesfile ~/.gitignore

Simpler Rebasing (avoiding unintentional merge commits)

Submitted by rfay on Sat, 2011-05-21 20:21

Executive summary: Set up to automatically use a `git pull --rebase`

Please just do this if you do nothing else:

git config --global pull.rebase true

About rebasing and pulling

Edit 2021-06-17: Updated the suggested command from the old git config --global branch.autosetuprebase always to the current (from a long time) git config --global pull.rebase true

I've written a couple of articles on rebasing and why to do it, but it does seem that the approaches can be too complex for some workflows. So I'd like to propose a simpler rebase approach and reiterate why rebasing is important in a number of situations.

Here's a simple way to avoid evil merge commits but not do the fancier topic branch approaches:

Go ahead and work on the branch you commit on (say 7.x-1.x)
Make sure that when you pull you do it with git pull --rebase
Push when you need to.

That's it.

Reasons to rebase to keep your history clean

There are two major reasons not to go with the default "git pull" merge technique.

Unintentional merge commits are evil: As described in the Git disasters article, doing the default git pull with a merge can result in merge commits that hide information, present cryptic merge commits like "Merge branch '7.x' of /home/rfay/workspace/d7git into 7.x" which explain nothing at all, and may contain toxic changes that you didn't intend. There is nothing wrong with merges or merge commits if you intend to present a merge. But having garbage merges every time you do a pull is really a mess, as described in the article above.
Intentional merge commits are OK if you really want that branch history there: If a significant, long-term piece of work has gone on and should be shown as a branch in the future history of the project, then go ahead and merge it before you commit, using git merge --no-ff to show a specific intentional merge. However, if the work you're committing is really a single piece of work (the equivalent of a Drupal.org patch), then why should it show up in the long-term history of the project as a branch and a merge?

I'll write again about merging and rebasing workflows, but for now we're just going to deal with #1: The case where you share a branch with others and you want to pull in their work without generating unintentional merge commits.

How to use git pull --rebase

The first and most important thing if you're a committer working on a branch shared with other committers is to never use the default git pull. Use the rebase version, git pull --rebase. That takes your commits that are not on the remote version of your branch and reworks/rebases them so that they're ahead of (on top of) the new commits you pull in with your pull.

Automatically using git pull --rebase

It's very easy to set up so that you don't ever accidentally use the merge-based pull. To configure a single branch this way:

git config branch.7.x.rebase true

To set it up so every branch you ever create on any repository is set to pull with rebase:

git config --global pull.rebase true

That should do the trick. Using this technique, no matter whether you use techniques to keep your history linear or not, whether you use topic branches or not, whether you squash or not, you won't end up with git disasters.

Bottom line: No matter what you do, please use git pull --rebase. To do that automatically forever without thinking,

git config --global pull.rebase true