Using Mercurial and mq to work on NetBSD

This page contains directions for using Mercurial as a commit buffer for NetBSD.

(It will not do you much good if you're trying to convert the master NetBSD tree to Mercurial or to work with such a converted tree.)

What it is

Mercurial is a distributed version control system ("DVCS"). mq is an extension to Mercurial for handling patch queues. The concept of patch queues was introduced by Quilt some years back.

This document assumes you already know more or less how to use Mercurial but may not have used mq before.

The model we're using here

What we're going to do is commit a NetBSD CVS working tree into a Mercurial repository. You can then use Mercurial to merge; it is better at this than CVS. You can also commit changes locally and ship them back to the CVS master later; this is useful in a variety of ways. You can potentially also clone the Mercurial tree and work jointly with other people, but there are limits to this as we'll discuss in a moment.

Because the NetBSD tree is rather large, you will find that if you commit the whole thing into Mercurial that a number of operations (anything that scans the working tree for changes) become annoyingly slow. It isn't slow enough to be unusable, and it's quite a bit faster than a comparable CVS operation (like running cvs update from the top level), but it's slow enough to be annoying.

For this reason, in most cases, I recommend committing only part of the tree into Mercurial and telling it to ignore the rest. This means you can't in general usefully clone the resulting Mercurial repo (although that depends on exactly what you leave out) but this is not a major problem unless you're specifically trying to work with someone else.

So the basic model here is that you check out a CVS working tree and use Mercurial to manage local changes to part of it, then later on commit those changes back to the master CVS repository.

Branches vs. patches

There are two ways you can manage your changes: as a branch, or as a patch queue. The advantage of a patch queue is that you can easily commit each patch individually back to CVS, and you can go back and forth between them and debug and polish each one separately. The disadvantage is that the merge facilities are (as far as I know anyway) relatively limited.

Conversely, if you commit your changes to a branch, you get all the native merging support in Mercurial. However, it is painful to try to commit anything other than one big diff for the whole branch back to CVS. (You might be able to do it via bookmarks and rebasing, but I've never tried and have no desire to figure out how.)

If you don't want to keep the incremental history of your local commits, use a branch. If you do, use a patch queue.

It is possible to use multiple branches to allow you to commit back in several stages. However, managing this is a major pain and I don't recommend it -- you might get away with two branches but more than that is probably a bad idea.

There's a Mercurial extension called the "patch branch extension" that lets you manage a whole graph of patches using branches. I haven't tried using it in some years; at the time it had scaling problems such that it became horrifyingly slow once you had more than a handful of such branches. That might have been improved in the meantime; if you find yourself wanting to use both branches and patches, it might be worth looking into.

It is also fairly probable that there is now a solution for merging with patch queues; it's been a while since I had time to look closely.

Setting up

First, check out a CVS working tree. You probably want to use a different one for each project, because different projects require changing different parts of the tree and so you will probably want to have Mercurial ignore different subtrees for different projects. (At least, I find it so; it depends on what you're working on.)

	% cvs -d cvs.netbsd.org:/cvsroot checkout -P src

Now create a Mercurial repository at the top level. (If you are working only in a subtree and you are *sure* that you will never need to change anything in other parts of the tree, you can create the Mercurial repository in a subtree. But unless you're absolutely certain, don't take the risk.)

	% cd src
	% hg init

If you're going to be using a patch queue, now enable mq.

	% vi .hg/hgrc
and add
	[extensions]
	hgext.mq =
(Since the extension is built into Mercurial, that's all you need.) You can if you prefer also put this in your .hgrc so mq is always on. Then do
	% hg qinit -c
The -c option tells mq that you'll be checkpointing your patches, which is usually a good idea.

Now prepare a .hgignore file. This file contains one regular expression per line; Mercurial ignores files (and subdirectories) whose paths from the repository root match one of the regexps. Add at least:

	^CVS$
	/CVS$
to ignore all the CVS control directories in the CVS checkout. While you can commit these to Mercurial, there's no point and it gets awkward if owing to mistakes later you end up having to merge them.

If you aren't arranging to put the tree's object directories somewhere else, then also add

	^obj\.[0-9a-z]*$
	/obj\.[0-9a-z]*$
and you might want
	^sys/arch/[0-9a-z]*/compile/[A-Z]
to ignore kernel build directories.

Ignore subtrees that you aren't working in. You don't have to bother to be very selective; the goal is to rapidly rule out a few large subtrees that you definitely don't care about, in order to avoid wasting time scanning them for changes. Unless you plan to be working with 3rd-party software,

	^external$
	^gnu/dist$
is a good starting point. Alternatively, if you aren't going to be working on MD kernel stuff or bootloaders,
	^sys/arch$
is a good choice as it's also large.

You can always unignore stuff later, so don't worry about remote possibilities.

Now commit the .hgignore file:

	% hg add .hgignore
	% hg commit -m 'add .hgignore file' .hgignore
Now add and commit the contents of the working tree:
	% hg add
	% hg commit -m 'HEAD of 20130101'
(or whatever date)

You are now in business.

Working

If you're using a branch, remember to change branches before you commit anything:

	% hg branch mystuff
You want to keep the default branch an untouched CVS tree so you can use Mercurial to merge. (And also so you can use Mercurial to extract diffs against CVS HEAD and so forth.)

Similarly, if you're using a patch queue, put everything in patches and don't commit. (There's a section below about working with mq if you aren't familiar with it.)

You can edit and build and test as normal. Use hg commit or hg qrefresh to sync stuff into Mercurial.

If you're using mq, it's a good idea to checkpoint your patch queue periodically. This is done as follows:

	% hg qcommit
The patches directory (.hg/patches) is stored in its own Mercurial repository, and this commits the patches to that repository. If necessary you can then fetch older versions of the patches back and so forth.

Updating from CVS

First, make sure all your changes are committed. (If you have unfinished changes that aren't ready to commit, there's a Mercurial extension for stashing them temporarily. If you have stuff that you don't want to commit at all, like debugging printouts or quick hacks, it's often convenient to keep those in their own mq patch, even if you aren't using mq for development.)

Now go back to a clean CVS tree. If using branches, go back to the default branch:

	% hg update -r default
If using mq, pop all the patches:
	% hg qpop -a
DO NOT run cvs update until/unless you have done this; it will make a mess. When you eventually do this by accident, see the section below on recovering from mistakes.

Now run cvs update from the top of the source tree:

	% cvs -q update -dP
You should get no conflicts from CVS and nothing should show as modified. (It is usually a good habit to save the cvs update output to a file to be able to check this.)

Tell hg to sync up:

	% hg addremove
Use hg to check what it thinks has changed:
	% hg status
Commit the changes to Mercurial:
	% hg commit -m 'Updated to 20130202"
Now you get to merge.

If you're using a branch, you want to merge the changes into your branch rather than merge your branch into the changes:

	% hg update -r mystuff
	% hg merge default
	(edit and resolve as needed)
	% hg commit -m 'sync with HEAD'
If it tells you "update crosses branches" when trying to update back to your branch, update to the parent changeset (the previous version from CVS) first, as that's an ancestor of your branch.

If you're using mq, the thing to do now is to push all your patches, and if any reject, clean up the mess and refresh them.

If patch tells you "hunk N succeeded at offset MMM with fuzz Q", it's a good idea to manually inspect the results -- patch being what it is, sometimes this means it's done the wrong thing. Edit if needed. Then (even if you didn't edit) refresh the patch so it won't happen again.

As I said above, it's quite likely that by now there's a better scheme for merging with mq that I don't know about yet.

Pushing back to CVS

When you're ready to push your changes back to CVS (so they're really committed), first (unless you're absolutely sure it's not necessary) update from CVS as above and merge. Then:

If you're using a branch, go back to the default branch and merge your changes into it:

	% hg update -r default
	% hg merge mystuff
	% hg commit -m "prepare to commit back to cvs"
Now cvs add any new directories and files; be sure not to forget this. It is a good idea to crosscheck with cvs diff and/or cvs update:
	% cvs diff -up | less
	% cvs -nq update -dP

Then you can cvs commit:

	% cvs commit
Because of RCSIDs, committing into cvs changes the source files. So now you need to do:
	% hg commit -m 'cvs committed'
and if you intend to keep working in this tree, you want to merge that changeset back into your branch to avoid having it cause merge conflicts later. Do that as above.

If you're using a patch queue, usually it's because you want to commit each patch back to CVS individually. First pop all the patches:

	% hg qpop -a
Now, for each patch:
	% hg qpush
	% hg qfinish -a
	% cvs commit
	% hg commit -m "cvs committed previous"
With a long patch queue, you'll want to use the patch comments as the CVS commit messages. Also, running cvs commit from the top for every patch is horribly slow. Both these problems can be fixed by putting the following in a script:
	hg log -v -r. | sed '1,/^description:$/d' > patch-message
	cat patch-message
	echo -n 'cvs commit -F patch-message '
	hg log -v -r. | grep '^files:' | sed 's/^files://'
(I call this "dogetpatch.sh") and then the procedure is:
	% hg qpop -a
then for each patch:
	% hg qpush && hg qfinish -a && dogetpatch.sh
	% cvs commit [as directed]
	% hg commit -m "cvs committed previous"
(This could be automated further but doing so seems unwise.)

Using CVS within Mercurial

You can successfully do any read-only CVS operation in the hybrid tree: diff, annotate, log, update -p, etc. Read-write operations should be avoided; if you mix upstream changes with your changes you will find it much harder to commit upstream later, and you may get weird merge conflicts or even accidentally revert other people's changes and cause problems.

If you clone the Mercurial tree and you didn't include the CVS control files in it, you won't be able to do CVS operations from clones. Including the CVS control files in the Mercurial tree is one way around that.

You will find that any large CVS operation on a clone is horribly slow. This is because making a clone causes CVS to think all the files in the clone have been modified since you last ran it; it then re-fetches every file you ask it about so it can update its own information. For this reason cloning the Mercurial tree usually isn't worthwhile and even when it is, including the CVS files in the Mercurial tree isn't.

Another consequence of this: do not try to cvs update in a cloned Mercurial repository; use only the original. Updating a clone basically downloads the entire tree over again from the CVS server.

DO NOT CVS COMMIT FROM A CLONE. It is known that some operations that muck with the timestamps in a CVS working tree can cause CVS to lose data. It is not clear if hg clone is such an operation; don't be the person who finds out the hard way.

Recovering from mistakes

The most common mistake is CVS updating when the Mercurial tree is not in the proper state for that; e.g. onto your branch or while you have patches applied.

The basic strategy for this is to use hg revert to restore the part of the tree it knows about, then go back to CVS, clean up the mess there, and update properly.

If you're using a branch:

	% hg revert -C
	% hg update -r default
If you're using a patch queue:
	% hg revert -C
	% hg qpop -a
The problem is, CVS will now think you've changed every file that Mercurial is managing, and the modifications are to revert all the changes that have happened since your previous update. You do *not* want that to turn into reality. Hunt down (with cvs -n update) any files that CVS thinks are modified, then rm them and run cvs update on them. CVS will print "Warning: foo was lost" and restore an unmodified copy.

When you have no files left that CVS thinks are modified, do a CVS update on the whole tree and merge it as described above. (You must do this, as the parts of the tree that Mercurial is ignoring will otherwise be out of sync with the parts it's managing.)

If you stored the CVS control files in Mercurial, then the revert will restore them, but your tree will still be inconsistent so you still need to do a proper update and merge immediately.

mq

The basic idea of mq (like quilt) is it maintains a series of patches against the source tree, that are to be applied in order. By applying them or removing them one at a time, you can move the tree to any intermediate state; and then you can update the topmost patch, insert a new patch, or whatever.

To see the list of patches:

	% hg qseries
To apply the next patch:
	% hg qpush
To remove the current patch:
	% hg qpop
To merge current working tree changes into the current patch:
	% hg qrefresh
To also update the current patch's change comment:
	% hg qrefresh -e
To collect current working tree changes (if any) into a new patch:
	% hg qnew PATCHNAME 
When there's an mq patch applied, you can't commit. (Doing qrefresh is basically equivalent to committing the current patch.) Diff will show the changes against the last refreshed version of the current patch; to see the complete changes for the current patch (including current changes), use "hg qdiff".

You can delete patches with "hg qrm" and rename them with "hg qmv".

Patches are applied with patch, unfortunately, which means that if they don't apply (which can happen if you or someone else changes something under one) you get .rej files you have to clean up by hand rather than a Mercurial merge.

When a patch is ready to be committed for real, you do "hg qfinish" on it. This removes it from the patch queue and converts it to a normal Mercurial changeset.

To change the ordering of patches, you edit the file .hg/patches/series. If the patches aren't orthogonal you'll have to fix the rejections when you next apply them. (Don't do this with patches that are currently applied.)

Use "hg help mq" to see the full list of mq-related commands.

I'm sure there are better mq tutorials out there.

Using mq

The basic process when using mq is that you start a new patch, edit and hack for a while, use hg qrefresh to commit it (once or many times), and when you're done go on to the next one.

If you find a bug in an earlier patch, you can go back to the patch that introduced it and fix the bug there, creating a new version of the offending patch that no longer contains the bug. (Or you can create a new patch that fixes the bug, but insert it immediately after the patch that created the bug.)

When a patch is ready to be seen by other people, you "finish" it and then it becomes a normal immutable changeset.

One catch is that you can't push or pop the patch queue while you have unsynced (uncommitted) changes. There are two ways around this; there's a separate "stash" extension that lets you put unfinished changes aside while you do something else. Or, alternatively, you can create a new temporary patch holding your unfinished changes, and then later use hg qfold to combine that with the patch you originally meant this for.

A variant of this problem is when you discover a bug, open an editor, fix it, and then realize that you wanted to make the edit in an earlier patch. Then you go to pop the queue and it complains that you have a modified file. If the modification in question is the only uncommitted change, the best way to deal with this is to create a new patch for it, then pop to where you wanted it to go and use hg qfold to apply it there.