Avoiding Git Disasters: A Gory Story

Sun, 2011-01-30 12:10 by rfay

Edit 2015-08-30: The bottom line years later: Use the (Github's) Pull Request methodology, with a responsible person doing the pulls. You'll never have any of these problems.

I learned the hard way recently that there are some unexpectedly horrible things that can happen to a project in the Git source control management system due to its distributed nature... that I never would have thought of.

There is one huge difference between Git and older server-based systems like Subversion and CVS. That difference is that there's no server. There's (usually) an authoritative repository, but it's really fundamentally just a peer repository that gets stuff sent to it. OK, we all knew that. But that has some implications that aren't obvious at first. In Subversion, when you make a change, you just push that change up to the server, and the server handles applying just that change to the master copy of the project. However, in Git, and especially when using the default "merge workflow" (I'll write about merge workflow versus rebase workflow in another article), there are times when a single developer may be in charge of (and able to unintentionally break) the entire codebase all at once. So here I'm going to describe two ways that I know of that this can happen.

Disaster 1: git push --force

A normal push to the authoritative repository involves taking your new work as new commits and plopping those commits as-is on top of the branch in the repository. However, when a developer's local Git repository is not in sync with (or up-to-date with) the authoritative repository (the one we normally push to), then it can't do a fast-forward merge, and it will balk with an error message.

The right thing to do in this case is to either merge your code with a git pull or to rebase your code onto the HEAD with git pull --rebase, or to use any number of other similar techniques. The absolutely worst and wrong-est thing in the whole world is something that you can do with the default configuration: git push --force. A forced push overwrites the structure and sequence of commits on the authoritative repository, throwing away other people's commits. Yuck.

The default configuration in git, that git push --force is allowed. In most cases you should not ever allow that.

How do you prevent git push --force? (thanks to sdboyer!)

In the bare authoritative repository,

git config --system receive.denyNonFastForwards true

Disaster 2: Merging Without Understanding

This one is far more insidious. You can't just turn off a switch and prevent it, and if you use the merge workflow you're highly susceptible.

So let's say that your developers can't do the git push --force or would never consider doing so. But maybe there are 10 developers working hot and heavy on a project using the merge workflow.

In the merge workflow, everybody does work in their own repository, and then when it comes time to push, they do a git pull (which by default tries to merge into their code everything that's been one on the repository) and then they do a git push to push their work back up to the repo. But in the git pull all the work that has been done is merged on the developer's machine. And the results of that merge are then pushed back up as a potentially huge new commit.

The problem can come in that merge phase, which can be a big merge, merging in lots of commits. If the developer does not push back a good merge, or alters the merge in some way, then pushes it back, then the altered world that they push back becomes everybody else's HEAD. Yuck.

Here's the actual scenario that caused an enormous amount of hair pulling.

The team was using the merge workflow. Lots of people changing things really fast. The typical style was
- Work on your stuff
- Commit it locally
- git pull and hope for no conflicts
- git push as fast as you can before somebody else gets in there
Many of the team members were using Tortoise Git, which works fine, but they had migrated from Tortoise SVN without understanding the underlying differences between Git and Subversion.
Merge conflicts happened fairly often because so many people were doing so many things
One user of Tortoise Git would do a pull, have a merge conflict, resolve the merge conflict, and then look carefully at his list of files to be committed back when he was committing the results. There were lots of files there, and he knew that the merge conflict only involved a couple of files. For his commit, he unchecked all the other files changes that he was not involved in, committed the results and pushed the commit.
The result: All the commits by other people that had been done between this user's previous commit and this one were discarded

Oh, that is a very painful story.

How do you avoid this problem when using git?

Train your users. And when you train them make sure they understand the fundamental differences between Git and SVN or CVS.
Don't use the merge workflow. That doesn't solve every possible problem, but it does help because then merging is at the "merging my changes" level instead of the "merging the whole project" level. Again, I'll write another blog post about the rebase workflow.

Alternatives to the Merge Workflow

I know of two alternatives. The first is to rebase commits (locally) so you put your commits as clean commits on top of HEAD, on top of what other people have been doing, resulting in a fast-forward merge, which doesn't have all the merging going on.

The second alternative is promoted or assumed by Github and used widely by the Linux Core project (where Git came from). In that scenario, you don't let more than one maintainer push to the important branches on the authoritative repository. Users can clone the authoritative repository, but when they have changes to be made they request that the maintainer pull their changes from the contributor's own repository. This is called a "pull request". The end result is that you have one person controlling what goes into the repository. That one person can require correct merging behavior from contributors, or can sort it out herself. If a contribution comes in on a pull request that isn't rebased on top of head as a single commit, the maintainer can clean it up before committing it.

Conclusions

Avoid the merge workflow, especially if you have many committers or you have less-trained committers.

Understand how the distributed nature of git changes the game.

Turn on system receive.denyNonFastForwards on your authoritative repository

Many of you have far more experience with Git than I do, so I hope you'll chime in to express your opinions about solving these problems.

Many thanks and huge kudos to Marco Villegas (marvil07), the Git wizard who studied and helped me to understand what was going on in the Tortoise Git disaster. And thanks to our Drupal community Git migration wizard Sam Boyer (sdboyer) who listened with Marco to a number of pained explanations of the whole thing and also contributed to its solution.

Oh, did I mention I'm a huge fan of Git? Distributed development and topical branches have changed how I think about development. You could say it's changed my life. I love it. We just all have to understand the differences and deal with them realistically.

Comments

Michael Prasuhn (not verified)

Sun, 2011-01-30 19:04

Avoiding Git Disasters: A Gory Story

Disaster 1: git push --force

Disaster 2: Merging Without Understanding

Alternatives to the Merge Workflow

Conclusions

Comments

Pages

Where to find me

Search