Out of tree file systems

This page is a list of stuff to take note of if you're maintaining an out-of-tree file system for NetBSD. It may not be complete, but it should list at least everything I've changed around while hacking up the VFS layer.

From netbsd-7 to -8

7.99.10 April 2015:
vget() now always returns an unlocked vnode. If you need it locked afterwards (not recommended for e.g. lookup) use vn_lock.
An extra boolean argument has been introduced to vget to make sure unadjusted code fails to build: this should be passed true, or false if LK_NOWAIT is passed. (LK_NOWAIT can still be used to avoid certain waits even though the vnode lock is not acquired; EBUSY is returned if waiting would be needed.)
Also, VOP_LINK has been changed to not unlock the directory vnode, like earlier changes to create, mknod, mkdir, and symlink.

7.99.8 March 2015:
The cred argument of bread() has been dropped. The credentials of the caller are used. This is not expected to pose a problem.

7.99.7 March 2015:
A new vnode cache operation vcache_new() has appeared; see the usage in ffs.

From netbsd-6 to -7

6.99.49 July 2014:
New vnode ops: fallocate and fdiscard. fallocate fills in regions within a file, fdiscard creates holes. These can be stubbed out and for the time being are in most fses.

6.99.41 May 2014:
There is now a vfs-level vnode cache; you can and should use it instead of cutpasting the ffs vnode cache code like the historical practice has been. Use your favorite (or most closely related) fs as a conversion example; if that fs hasn't been converted yet, ask on tech-kern as there's probably a reason.
(For example, some fses required rekeying the cache; that was added in 6.99.46 in June 2014.)

April 2014:
Make sure your foofs_mount() routine checks for null per-fs data. Most of the existing ones didn't and would cheerfully crash if abused.
6.99.38 March 2014:
References to VI_XLOCK, and uses of vwait to wait for it, should be removed. Attempts to use VI_XLOCK to check if the vnode is going away should be replaced with calls to vdead_check(). A custom VOP_LOCK should be adjusted like genfs_lock. Other uses, or uses of VI_CLEAN, are probably wrong; ask on tech-kern.

6.99.34 March 2014:
Any filesystems that try to manually iterate over mnt_vnodelist (like ffs did) should be updated to use the new vfs_vnode_iterator interface. Note that the interface was improved in 6.99.43 (May 2014) -- be sure to use the latest form.

6.99.33 February 2014:
Certain aspects of vnode locks were fixed; the practical upshot for filesystem code seems to be that one should refrain from acquiring the vnode lock in VOP_RECLAIM as it is no longer necessary there. See source-changes-d.
Also, layer fses should be updated to reference layer_lock in their vnode ops tables as per -r1.39 of null_vnops.c; and if your filesystem has its own lock operation (rather than using genfs_lock) you will need to rearrange it matching the genfs_lock changes in -r1.190 of genfs_vnops.c.

6.99.31 February 2014:
The lookup vnode operation should now return the result vnode (*vpp) unlocked. The cache_lookup function has been changed to match. Note that filesystem-level code should refrain from locking and unlocking the result vnode during lookup as this can prompt deadlocks in rename.

6.99.29 and 6.99.30 January 2014:
The create, mknod, mkdir, and symlink vnode operations now lcok more sanely: filesystem code should be changed to not lock the result vnode (*vpp) and to not unlock the directory vnode (dvp) either.

6.99.25 October 2013:
vrelel() is private now. Callers that lock v_interlock and then call vrelel() should just call vrele(); callers that do other things are probably wrong. (Seek advice on tech-kern.) Also, the unused 3rd (lwp) argument of vrecycle() was removed.

6.99.24 September 2013:
References to v_specmountpoint on a device vnode (to get the fs mounted on) should be changed to use the accessors spec_node_getmountedfs() and spec_node_setmountedfs().

6.99.16 December 2012:
The bread() and breadn() functions were changed so they never return a buffer on error. Callers must be changed to not correspondingly call brelse() on error.

6.99.15 November 2012:
The interface to the name cache was changed around to disentangle it from struct componentname and namei internals. This is actually two changes committed on top of each other. The first, in typical usage, changes
	if ((error = cache_lookup(vdp, vpp, cnp)) >= 0)
		return (error);
   
to
	if (cache_lookup(vdp, cnp, NULL, vpp)) {
		return *vpp == NULLVP ? ENOENT : 0;
	}
   
that is, the order of the arguments is fixed to put the result parameter last and the sense of the return value changes. The new return value is either true (for a cache hit) or false (for a cache miss); a hit can be either a negative result, in which case the vnode is null, or a positive one, in which case it is not. If your filesystem supports whiteouts, you must to fetch an additional "iswhiteout" result and update cnp_flags; see the changes in ufs for details. Other/most filesystems can just pass NULL.
The second change passes various members of struct componentname to cache operations instead of the componentname pointer itself. For example, the above call becomes
	if (cache_lookup(vdp, cnp->cn_nameptr, cnp->cn_namelen,
		cnp->cn_nameiop, cnp->cn_flags, NULL, vpp)) {
		return *vpp == NULLVP ? ENOENT: 0;
	}
   
and the changes to other namecache calls are similarly mechanical.

6.99.13 October 2012:
Changes to namei to allow the openat() family of system calls were added. You shouldn't need to change your filesystem for this but it is worth taking note of in case it breaks something.

6.99.10 July 2012:
The semantics of cache_enter() were changed slightly; in particular, several tests that filesystems were previously supposed to perform themselves to see if cache_enter() should be skipped are now done within cache_enter(). These are: If your filesystem makes any or all of these checks itself before calling cache_enter() please remove them. Note that most of the existing filesystems typically checked a randomly chosen subset of these conditions, not all of them, so if you copied the logic from somewhere you probably don't have all three tests. Note: this change was also quietly added to netbsd-6 during the beta period and is thus in 6.0.

6.99.7 May 2012:
A genfs_rename operation was added to handle locking for rename. You are strongly urged to convert your filesystem's rename locking to use genfs_rename. Most home-rolled solutions are incorrect one way or another. (And recall that ffs had been wrong for years, and we've had to fix zfs too.)

6.99.4 March 2012:
A bunch of kauth-related changes went in, some pertaining to vnodes and filesystems. Check your favorite filesystem for how to adapt, and beware because a number of the initial changes were incorrect. If in doubt, ask tech-kern. (Sorry, better documentation is not really available.)

From netbsd-5 to -6

6.0_BETA2 August 2012:
The VFS change described above under 6.99.10 (adjustments to cache_enter) was pulled up to the netbsd-6 branch owing to accident/miscommunication; it was decided to keep it rather than revert it because other more desirable changes depended on it and the backward compatibility risks were minor.

5.99.62 (and .63) January 2012:
New quota code (again). If you were for some reason trying to support the old ufs-only quota interfaces, or the recent proplib-based quota interface, or any quotas at all, you'll need to do some hacking. Ask tech-kern for help.
Note: it was discovered in November 2012 that the auto-generated prototype for vfs_quotactl() implementations in VFS_PROTOS() was wrong. This slipped by because the only implementation of vfs_quotactl is ufs_quotactl, which isn't covered by a VFS_PROTOS(). The fix will be pulled up to the 6.0_STABLE branch and should be in 6.1.

5.99.56 September 2011:
The handling of NAME_MAX was tidied up. Currently NAME_MAX is 511 but filenames cannot actually exceed 255 characters long. If your filesystem uses NAME_MAX, please change it to use a constant belonging to your filesystem instead (e.g. MYFS_NAME_MAX or MYFS_MAXNAMLEN), choose the value of this constant based on the capability of the file system, and, for futureproofing, enforce the limit in VOP_LOOKUP.

5.99.55 July 2011:
VOP_BWRITE now takes a vnode as its first argument like all other VOPs. All occurrences of VOP_BWRITE(bp) should be changed to VOP_BWRITE(bp->b_vp, bp), and references to layer_bwrite() can be removed.

5.99.53 June 2011:
UVM locking changed; minor effects on some filesystems, particularly layers. Check genfs for examples.

5.99.51 April 2011:
vflushbuf() can now return an error. Make sure you check for it.

5.99.50 April 2011:
VOP_LINK changed: filesystems are now no longer responsible for checking for cross-device links or links to directories; the FS-independent code does that now. It's recommended on general principles that you KASSERT these properties.

5.99.48 March 2011:
New quota code. If you were for some reason trying to support the old ufs-only quota interfaces, or any quotas at all, you'll need to do some hacking. Ask tech-kern for help.

5.99.43 January 2011:
SAVESTART is history. If you were calling relookup with SAVESTART set (that is, calling relookup in your fs's rename code without explicitly clearing SAVESTART from cn_flags) the directory vnode will no longer gain an extra reference from the relookup call and you need to adjust the reference counting accordingly. If you were explicitly clearing SAVESTART before calling relookup you can prune the code that clears SAVESTART. (Most but not all existing fs code explicitly cleared SAVESTART.) Also make sure that your rename code doesn't drop the last reference to either of the directory vnodes (either to or from) and expect it to magically remain valid. Some legacy code did that, and SAVESTART was apparently originally invented as a workaround. Or something. The signature of relookup was changed: it now needs an extra dummy integer argument, to make sure all code calling relookup gets examined. Consider fixing your locking so you don't need to use relookup. (If on the other hand you have FS code that was setting SAVESTART or making namei calls and using ni_startdir, contact me or ask tech-kern for advice.)

5.99.41 November 2010:
The SAVENAME and HASBUF namei flags have been removed. There is now always a buffer (so HASBUF would be always true) and the pathname in struct componentname is always valid in VOP operations. The buffer is no longer exposed as cn_pnbuf. You can safely remove all logic from your FS that frees cn_pnbuf or sets SAVENAME, and any HASBUF-based logic that remains can be made unconditional. If you were using cn_pnbuf for other purposes, contact me or ask tech-kern for advice.

5.99.40 November 2010:
struct pathbuf was added and the signature of NDINIT() changed to require a pathbuf rather than a string and uio_seg. See pathbuf(9) and namei(9), and example code all over the kernel. Calls to namei_simple_* are not affected.

5.99.38 July 2010:
The VI_FREEING vnode flag was killed off. On-disk inodes should be freed in the reclaim routine. See the ffs code.

5.99.34 July 2010:
vlockmgr() was killed off. Uses of vlockmgr() in file systems should be replaced with VOP_LOCK or VOP_UNLOCK.

5.99.32 June 2010:
The flags argument to VOP_UNLOCK was removed as it served no purpose.

5.99.31 June 2010:
Vnode locks are no longer allowed to be recursive. Hopefully your FS wasn't relying on this.

5.99.30 June 2010:
Layered FSes now pass the locking ops down to the leaf FS. The v_vnlock member of struct vnode is no longer used. If you have a layer FS, check the nullfs diffs; if not, you shouldn't be affected.

5.99.19 September 2009:
The VFS-level lookup() function was actually an abusive interface used only by nfsd. It has been removed; in the off-chance you have a network FS that was using it please look into more appropriate ways of calling namei and if necessary get in touch with me or post on tech-kern. Please don't use the private interfaces currently exposed for nfsd as they aren't meant to be stable.

January 2010:
The VATTR_NULL, VREF, VHOLD, and HOLDRELE vnode macros were killed off. Use the lowercase function forms.

5.99.15 June 2009:
The functions namei_simple_kernel and namei_simple_user were added to cover the most common cases of namei. If your FS calls namei and the usage matches these functions, switching to them will insulate you from upcoming namei interface changes.

February 2009:
ffs+softupdates was trashed and some of its supporting material ripped out along with it, particularly the gross "bioops" function table for FS callbacks for buffer operations. If your FS was relying on this, (1) please post to tech-kern to get started on a better interface and (2) we apologize for allowing this crap to exist in the form it did.

5.99.7 January 2009:
time_t was changed to be 64 bits wide. So was dev_t. Make sure your on-disk structures use explicitly sized types and do not rely on the sizes of system types.