I've taken some time to try and convert pkgsrc and src into mercurial, and this is a kind of braindump of what happened, lessons learned, etc. The goal was to produce a mercurial mirror of src and pkgsrc which I could use personally. The challenge was to get there, armed with only a copy of the CVS repo, and to exercise various conversion tools in the process. The current status is this - as of March 28th, 2014, I have mercurial, git, fossil (and cvs) copies of the original NetBSD cvs repos for src and pkgsrc. The reasons for having so many different types of repos will become apparent later on. The sizes for the src repos are as follows: 26G src.hg2 15G src.hg1 2.1G src.hg 6.8G src.git 5.8G src-20140328.fossil 9.8G repo/src (cvs) The reason for the three mercurial repos will become apparent later. 1. Straight "hg convert" from cvs to mercurial Having taken a copy of the ,v files in the pkgsrc cvs repo, I started out with a straight "hg convert" of the cvs repo. This had numerous interesting finds: + the conversion process uses cvs2ps, and then parses the commit logs. Unfortunately, there are 2 commits to dcraw which quote the upstream cvs commit log verbatim, and the "hg convert" process thinks that this is the start of a new change, gets weirded out by the unfamiliar rev log number, and aborts. + i also had to take the rcs files for openjade and crimson from my conversion process, or hg convert aborted there too + pkgsrc/sysutils/user/files/Attic/md5,v is a zombie that won't die, and also aborted the conversion. This is my own fault, as I wrote the code :-) This one was also fixed manually. After over 3 days, I finally had a converted hg repo; the unfortunate thing about this was that only about half of the files were present, either in a cloned repo, or a working copy. This wasn't usable in any way, so time to try another approach to conversion. 2. reposurgeon I set up reposurgeon after the previous failure. This was more promising, but fairly heavy on resources - the process would eat up 12-16 MB of memory every five seconds. By adding swapfiles with swapctl I got up to 20GB of swap, and reposurgeon was still eating it at the same rate. After the process died overnight when I wasn't around to add even more swap, I decided there must be a different approach. In retrospect, this may be an artefact of the repo size discovered later on, but without any information from reposurgeon, it's difficult to tell. 3. cvs to fossil to git to hg The method that worked for me in the end, is thanks to Joerg, and basically does a conversion through every conceivable DVCS to get to a usable mercurial repo (oh, bazaar, damn). Joerg has written up all the ins and outs of his conversion process, and makes his git mirrors available for everyone to use: https://blog.netbsd.org/tnf/entry/fossil_and_git_mirrors_of https://github.com/jsonn/pkgsrc https://github.com/jsonn/src and the fossil files: http://ftp.netbsd.org/pub/NetBSD/misc/repositories/fossil/ and his sources for the cvs2fossil conversion are discussed here: http://www.sonnenberger.org/2011/05/12/may-update-cvs2fossil/ So I used the same method that joerg used (not optimised to miss out the top-level directory), and after a day of fossil and git fast-import, I too had git repos I could use. The theory was that the path from cvs to git is much more well-trodden than from cvs to mercurial, and so bugs are more likely to have been zapped by others along the way. And I thank them for that. On the bright side, all of the converted files seemed to be in the resulting git repo. The "hg convert" from git to mercurial seemed to go much quicker than the one from cvs to hg, at a rough estimate around 10000 changesets per hour. About a day later I had a pkgsrc.hg repo. It was 26GB in size. David Holland kindly pointed me towards mpm, and he gave me some guidance on some ways to reduce the size of the mercurial repo, straight from the horse's mouth. The way we've grown the cvs repo over the years brings out the worst in mercurial's space management, and the --datesort switch I used when converting from git to hg also seemed suboptimal. mpm also went and created the http://mercurial.selenic.com/wiki/GeneralDelta page, which says: "The original Mercurial compression format has a particular weakness in storing and transmitting deltas for branches that are heavily interleaved. In some instances, this can make the size of the manifest data (stored in 00manifest.d) balloon by 10x or more. The generaldelta option is an effort to mitigate that, while still maintaining Mercurial's O(1)-bounded performance." Armed with that information, I then did 2 further clones on the src repo (see sizes above) using generaldelta, and brought down the size from 26GB to 2.1GB. The saving on the pkgsrc repo was even more spectacular: 2.7G pkgsrc-20140325.fossil 2.6G pkgsrc.git 22G pkgsrc.hg2 1.1G pkgsrc.hg1 638M pkgsrc.hg 1.6G repo/pkgsrc Thanks to mpm and the folks who helped out on #mercurial at freenode, and a huge raspberry to the pastebin fanatics there. The upshot of this is that I now have mercurial (and git and fossil) mirrors, if anyone wants to try it/them out as an alternative to cvs. I'm currently negotiating with the system administrators at NetBSD.org about getting some resources to house this on a more formal basis.