Git Adoption Soaring; Are There Good Migration Strategies? 346
Got To Get Me A Git writes "Distributed version control systems (DVCS) seem to be the next big thing for open source software development. Many projects have already adopted a DVCS and many others are in the process of migrating. There are a lot of major advantages to using a DVCS, but the task of migrating from one system to another appears to be a formidable challenge. The Perl Foundation's recent switch to Git took over a year to execute. The GNOME project is planning its own migration strategy right now after discovering that a significant majority of the project's developers favor Git. Perhaps some of the projects that are working on transitions from other mainstream version control systems can pool their resources and collaborate to make some standardized tools and migration best practices documentation. Does such a thing already exist? Are any folks out there in the Slashsphere working on migrating their own project or company to a DVCS? I'd appreciate some feedback from other readers about what works and what doesn't."
Git links (Score:4, Informative)
Re:Adopt a git... (Score:5, Informative)
It's more than just an moron, it's an nasty, stubborn, selfsentered and selfish moron.
"Our neighbour is an right old git" could be used to describe an elderly neigbour who, say, regularly blocked your driveway because your car got in the way of the sunlight on his garden.
The old neighbour from Dennis, or Victor Meldrew from One Food In The Grave are both fine examples of gits.
It's like a weaker version of the c-word.
If it looks like a tree, you'll probably be fine (Score:5, Informative)
If the system you are migrating from manages trees, you should be fine. CVS migration is pretty easy and I understand that Perforce works quite well too (in both directions!). Most of the migration tools are listed in the GIT FAQ [git.or.cz].
The places where people are likely to have trouble is migrating from tools that don't understand that there's more than one file. For example, RCS and SCCS both support branches, but in a completely different way to git (branches are per-file, they're not for the whole repository). This means that during conversion, something useful has to happen with them, but the right answer isn't clear to a program. If versions 1.1, 1.1.1.1, 1.1.1.2 and 1.2 of file "foo" exist, then versions 1.1.1.2 and 1.2 are on different branches and either may be the older revision. It's not clear if revision 1.43.1.3 of file "bar" is the same "branch" as "foo" 1.1.1.2 or not. Because RCS and SCCS deal with single files only, it's not possible to find an answer to these questions in the history files at all - if there is an answer, it's just a convention of the user. Essentially what's happening here is that the git import process requires information which isn't represented in the files you're converting from. For what it's worth, migrating from SCCS or RCS to CVS has a similar problem.
Personally, I've migrated from CVS to git for findutils (well, Jim Meyering did the actual migration; he migrated coreutils too). I haven't regretted migrating at all. It took me a long time of using git locally before I was comfortable migrating the master repo, though. As a git beginner the thing I found most worrying was that I found it hard to envisage the effect of the git command I was typing. The thing it took me a long time to figure out is that with a distributed version control system, it's safe to screw up your local copy, as long as you don't push the result.
Re:Adopt a git... (Score:5, Informative)
strategy (Score:4, Informative)
Re:bzr vs. git? (Score:5, Informative)
I always though bzr had the edge on git in terms of being a better DVCS. Is there a reason why the article seems to think that git is the default?
No such thing as 'better' here.
Bazaar was the runner-up DVCS, and rightfully so, but it has both advantages and disadvantages. Git is faster and currently more popular. Bazaar has an easier interface, better GUIs, is more easily extensible (Python), and runs better on non-Linux platforms.
So which you prefer is a matter of what you are looking for.
Re:Why is it soaring? (Score:5, Informative)
It's four good reasons.
1. You can use git for any purpose. You have to pay serious coin before you can use Bitkeeper for any purpose.
2. You have the freedom to see how git works, down to every last line of code. I can't comment on whether Bitkeeper also includes this level of freedom.
3. You can make any damn changes you want to Git, without prior approval.
4. You can pass on all these freedoms, and the freedom to use your change, to anybody you want. It was precisely the fact you can't do that with Bitkeeper that led to it being dropped by the Linux developers and replaced with a coded-from-scratch replacement.
Re:Why is it soaring? (Score:4, Informative)
Re:I want to use git (Score:5, Informative)
Comment removed (Score:4, Informative)
Comment removed (Score:5, Informative)
Comment removed (Score:3, Informative)
Re:Adopt a git... (Score:3, Informative)
I wonder if Linus knows?
Re:Git links (Score:5, Informative)
Some things git is bad at:
- no of partial cloning, so a big history means lots of stuff to download, this is especially bad when it comes to binary files
- no way to download just a single file or directory, a user always has to clone the whole repository
Re:Why is it soaring? (Score:1, Informative)
1. It does the DVCS work as well as bitkeeper
2. It is free software in both senses of the word
3. It doesn't have a non-compete clause (a very large number of developers don't tolerate such crap)
4. It has reached critical mass (in fact, it is, like the Linux kernel, far above the critical point), and there are so many good people working on every aspect of git (core, enhancements, CLI interface, GUIs, IDE integration plugins, and even "tortoise" crap for the lesser capable) that it is making a LOT of progress, very very quickly.
So, while bitkeeper has superior UI tools, that won't last long. That doesn't mean you're wrong at using bitkeeper in your company, as bk is a far, FAR better VCS than anything else in the proprietary software world.
Re:Why is it soaring? (Score:5, Informative)
It's worse than that. The bitkeeper author at one point tried to extend that as a ban on anyone who works for a company that has a competing product with bitkeeper.
Re:Git links (Score:4, Informative)
For problem one: git clone --depth 1 (or however far back you want your history to go); note this severely cripples git's abilities and isn't very useful at all unless you're on dialup still.
For problem two: this isn't a real problem with git, but rather with your organization. Multiple projects don't belong in the same repository, it's as simple as that.
Re:Why is it soaring? (Score:1, Informative)
Actually, behind the hood Git is technically way superior to BitKeeper.
The only thing they really share is using the
distributed model. In other respects, Git is to BitKeeper like Svn is to CVS. Like CVS, BitKeeper uses per-file revisioning, with all of the subtle problems it entails. Git, like Svn, uses a solid approach of revisioning trees.
Another thing is that Git understands branches, BitKeeper only has the concept of trunk. This means no cheap branches, and makes merging a pain (you can't even try to _compile_ a merge to check your conflict resolutions before you commit).
The only reason to use BitKeeper is that your revision history is locked into its proprietary format.
- Kristian.
Re:Git links (Score:5, Informative)
Problem two *is* a problem with git, it has nothing to do with how you organize a project, since you can never guess what a user might want. Simple example: I would like to look at the latest version of a file in the linux kernel, with git I have to download the whole beast when all I want is a single file, which is neither pretty nor fast.
Re:Git links (Score:4, Informative)
I hadn't really thought of that, I had assumed you were referring to Subversion's rather common case where multiple projects are stored in the same repo, and you checkout different directories to access one of them.
Anyhow, most, but not all, public git servers have a gitweb or similar attached, which will at least let you browse and download files from the tree if you need to. For example, grabbing the latest README of Linus' Linux tree can be had via http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob_plain;f=README;hb=HEAD [kernel.org]
Git itself doesn't provide any mechanism for it, however, but it's fairly unusual to be interested in a specific file rather than the entire project.
Re:Why is it soaring? (Score:4, Informative)
Re:Bazaar (Score:3, Informative)
huh ? mercurial has no issue with revisions numbers, revisions are indexed with a hash just like git . The sequencial numbering is available as a convenient alternative for referring to a rev, that's all.
Re:Git + Eclipse (Score:2, Informative)
Re:Bazaar (Score:2, Informative)
hg has both a revision number and a changeset id. The revision number is human readable and useful within a particular repo. The changeset id is unique across repos, the same as git does.
Re:Git links (Score:2, Informative)
Unless you are intensely involved in a project, browsing for the single file using some alternative interface is probably going to be easier. I.e.:
http://github.com/github/linux-2.6/tree/master/kernel/fork.c [github.com]
That doesn't work if the repository doesn't have an alternative interface though (but for projects you are involved in, the download only needs to be the difference between the internal and external repositories, not the entire history).
Re:Darcs (Score:2, Informative)
Re:bzr vs. git? (Score:3, Informative)
TortoiseGit: the best shot at a Windows GUI (Score:3, Informative)
At work we're trying to get all our our repos moved to Git. We moved off of CVS to SVN a year or so (which was a huge improvement), but now that all of the non-programmers in the office are used to using TortoiseSVN [tigris.org], lack of a good windows GUI for Git has been a bit of a roadblock.
The msysgit [google.com] folks started work on Tortoise-inspired GitCheetah [google.com] GUI, but that project basically fizzled out. Lots of people wanted a Windows GUI, but no one had both the resources and drive to step up and do it.
Then, exactly two months ago, Frank Li started working on TortoiseGit [google.com]. From what I can tell, this is a fork of TortoiseSVN with most of the Subversion guts pulled out and replaced with git commands. TortoiseGit is not done yet: 'git add' has some issues, Submodules don't seem to work at all, etc..., but development on the tool is in high gear and the primary developer is going the extra mile to help users [google.com].
If you're looking to deploy tools right now, gitk is a bit more powerful than the log in TortoiseGit, but might be more confusing for naive users.
Re:Git links (Score:2, Informative)
In Mercurial, every changeset is marked as belonging to a single branch (the default is a branch named "default"). This marking happens at commit time and cannot be changed afterwards (short of creating a new commit). Typically a changeset inherits the branch name of it first parent, but you can always mark it differently before the commit. So instead of a branch being a pointer to a single changeset, it actually identifies some subset of the DAG, and when you refer to a branch you are asking Hg to figure out the tip of the DAG subset marked with that branchname. Note that since the branch name is recorded internal to the changeset, it remains the same for all clones of the repository.
Personally, I think the git mechanism is more flexible, but the hg mechanism suffices just fine for basic use cases, and leads to fewer surprises among new users.
Re:How does it work with non-static IPs? (Score:2, Informative)
There's no way to pull from a repo that's behind a NAT unless you have sufficient control over the NAT to forward a particular port to a specific machine behind it. This is the same as svn -- how can you access an svn server that's behind a NAT? Only by having its relevant port forwarded.
However, if you are in a coffee shop and you want Sue to have your devel history, you can push to Sue instead of having Sue pull from you. Sue will then merge your pushed changes into her working copy when she feels like it.
Equivalently, you can set up another git repository on your home server which has a static IP. Then, you pull from and push to your home server, which you can access from anywhere with your laptop, and other people also pull from your home server. I used this approach when I was developing from home and didn't have time to make sure permissions on everything were okay and granting other developers ssh accounts on my machine, and didn't have time to set up an http server for the repo.
@#$@#$ git! (Score:5, Informative)
I curse more when I use git than when I use Windoz (and those are the only times I curse). Git's design is really that bad (from a user perspective).
Git is fully distributed (with no "authoritative" source), but it doesn't give you any tools understand/manage the distribution of files. If you have a work group with more than a few people, you are constantly asking what repo (and what access method to it), what branch, and what (bombastically long) revision. It's fine for 1-2 people, but then any version control system is fine for a small enough group.
The documentation helps little. When you do "git help merge", you don't @#$@# care that this is the front end to multiple merge methods. You just want the stupid thing to work. If it's a special case, then you'll look for an advanced technique; but you are stuck reading through all this crap trying to figure out what really matters. No offense to the people working on git docs. It used to be awful, now it just sucks. The problem is more in the user interface design than the docs.
There are over 100 git commands, and a command can do radically different things depending the the switches and target syntax. It's more confusing than any other revision control system that I have worked with.
I use git because I have to, not because I want to (like Windoz). After using it for months, I still routinely get stuck trying to figure out the right mix of commands, arguments, and target syntax needed to get common things done.
Git can do some (nice) things that subversion can't, but it creates so many problems that you haven't gained anything.
I've heard good things about mecurial and bazaar. I wouldn't recommend git to anyone I liked (but it's perfect for perl :-).
Re:How does it work with non-static IPs? (Score:3, Informative)
There aren't any strict rules saying that people have to pull straight from your laptop.
In terms of non-distributed VCSes, would you ever host a your repository on a machine that other people couldn't access? It would always be somewhere publicly accessible.
For this kind of situation, you'd probably have a public development repo that's separate from your official repository. This would give you a set of repositories that looks like:
official - The authoritative repository, controlled by some kind of integration manager
jim-dev-public - Code that Jim is ready to unleash upon the world, but not upon the official repository
jim-dev-private - Code that Jim is currently working on from his blimp with irregular Internet access
fred-dev-public, alice-dev-public, bob-dev-public - These are the only ones that you need to pull from. Fred, Alice, and Bob can have as many private repositories as they'd like, and will share their work when it's ready.
complexity (Score:3, Informative)
The complexity of git robs it of quite a bit of the value of it's features. For God only knows what reason, a 5-6 person project that i'm working on is using git instead of subversion, and only the person who setup the project actually has any idea how to use git. The rest of us are just cruising along, not really having any idea of what we are doing with it, and are stopped completely whenever it does anything weird.
It's awesome to have the whole thing where it merges all the changes in a same file together, fairly intelligently, but even the GUI version for Windows has no functional interface for how to deal with conflicts (which should be easily done as a "which bit of code is the proper piece to use here?" instead of jamming diffs into a file. Also, the Windows and Linux versions of GIT have several problems interoperating with each other.
In short, Git appears to have been designed entirely with features in mind, and not one bit of usability for anyone other than Linus himself. It is a nightmare for people who only have the need for version control and a handful of people working together. It reminds me very, very much of early Linux, before anyone else besides Linus had been hacking on it.
Re:How does it work with non-static IPs? (Score:3, Informative)
Take a page out of Freedesktop.org's process. Any user can create and maintain user repositories in their own space. For example, http://cgit.freedesktop.org/~csimpson/mesa [freedesktop.org] is my personal Mesa repo. Then, anybody that wants can pull from there. Very rarely do FD.O people pull and push directly to each other, and I doubt that it happens that way in larger organizations, either.
Re:Git links (Score:5, Informative)
In case you were looking for answers rather than abuse: I have used both. For me git does what svn does, plus the following in order of most important first.
Of course, this is for me, and all points might be irrelevant for you.
Re:Mercurial vs. Git (Score:2, Informative)
Have they improved branching in hg?
Bookmarks are what your after and they came in a couple of versions ago.
Re:Meanwhile... (Score:3, Informative)
Based on my imperfect reading, I can see two main appeals of ClearCase:
Schwab
Re:My usage of Subversion (Score:3, Informative)
The main advantage of git over subversion for such uses is that git doesn't require a seperate repository, the repository sits right there in the .git/ directory in your projects directory. Doesn't sound like much, but its great convenience factor, since all you have to do to start using git is type 'git init' and you are done, you don't need to create a repository, you don't need to important your existing files and most importantly you can leave your working directory as is. With SVN this same process can get quite annoying, since you basically have to delete your working direcotry and replace it with a SVN checkout and then verify that all your files actually made it into the repository. With git its a single command and you don't have to think about anything, so its much easier and you can just version control any directory you like.
Re:Git links (Score:3, Informative)
I can give you the reason why I switched.. which may or may not help.
With SVN, I found that branching was so involved that I wouldn't do it. Instead, I would check out code, work on it, and wouldn't check it back in until I had completed whatever I was doing... which may be days or weeks away.
Checking code back in with SVN almost became a "release".
With Hg, I pull down a copy of the code, make changes, commit those changes to my local repo, even if things are so broken they don't even compile. Then, when I'm done, I'll push all my commits back to the server.
Re:@#$@#$ git! (Score:1, Informative)
Git is fully distributed (with no "authoritative" source), but it doesn't give you any tools understand/manage the distribution of files. If you have a work group with more than a few people, you are constantly asking what repo (and what access method to it), what branch, and what (bombastically long) revision. It's fine for 1-2 people, but then any version control system is fine for a small enough group.
This is a wetware problem. If you are managing a product with git (or any distributed version control system) there must be one repository where the product is built from. With Linux this is Linus's repository. Everyone works on stuff and tries to get that stuff into his tree, and he often argues about why their changes are wrong and shouldn't go in. There are levels of indirection because of subsystem maintainers who also get to argue with developers before letting changes into their trees.
When stuff does get into Linux's tree then everyone else pulls it back so all the other subsystem developers get that, and Linus publishes the product -- an official (or release candidate) Linux kernel source.
Re:Git links (Score:3, Informative)
Some of our submodules are 20 times bigger than the Linux kernel and there is no way to subdivide them more than that.
But do they accept more changes than the Linux kernel does? Linus Torvalds's 2.6.28 tree alone (which goes back to 2.6.12-rc2, dated April 16, 2005) has 120035 commits. That doesn't include any branches that others have worked on.
Re:Git links (Score:3, Informative)
Ignore tha fanboys. If anything, use them as statistical evidence that there might be something worthwhile here. :-)
Why git for a SVN user? There's nothing better than trying it for yourself (git-svn clone svn://whatever, then hack on it with git, then git-svn dcommit). But until then, two big points:
1. It's distributed. I can make lots of commits without pushing them somewhere public, which is good for the same reasons that hitting "Save" often is good, without being worried that I've broken the build for everyone.
Relatedly, I can put my stupid personal projects in git from the beginning, without bothering to set up a server. But if I find I do want to share it with anyone or add anyone else as a contributor, there's basically no difficulty in doing so.
2. Git lets you rewrite your local history. If I didn't like a commit, I can edit it before sending it off. My workflow is often edit-commit-compile-test, rather than edit-compile-test-commit, which lets me remeber why I thought editing this file was a good idea. And if it's not, I can delete that commit from history, rather than having yet another one to revert it. And then when I'm done with this task, I can squash all my temporary commits down to two or three, one for each major part of what I did. As a side benefit, my commit messages only have to be useful enough for me, since I can edit them before pushing commits.
(Obviously, once you publish a commit, it's a pain to retract that commit from everyone else's repos.)
Another part of rewriting history is that rather than trying to do a merge when you've been hacking for a while, you can save your local commits in another branch, update to upstream, and cherry-pick the ones that make sense individually and edit the ones that need to be edited, creating a linear logical history rather than a merge between branches. This will make you saner a month later when you're trying to figure out why the code changed as it did, and you don't have to follow multiple branches and see how they were resolved.