Tutorial On Rump Kernel Servers and Clients

Antti Kantee <pooka@NetBSD.org>

Introduction
Important concepts and a warmup exercise
Userspace cgd encryption
Networking
Emulating makefs
Master class: NFS server
Further ideas

Introduction

The rump anykernel architecture allows to run highly componentized kernel code configurations in userspace processes. Coupled with the rump sysproxy facility it is possible to run loosely distributed client-server "mini-operating systems". Since there is minimum configuration and the bootstrap time is measured in milliseconds, these environments are very cheap to set up, use, and tear down on-demand.

This document acts as a tutorial on how to configure and use unmodified NetBSD kernel drivers as userspace services with utilities available from the NetBSD base system. As part of this, it presents various use cases. One uses the kernel cryptographic disk driver (cgd) to encrypt a partition. Another one demonstrates how to operate an FFS server for editing the contents of a file system even though your user account does not have privileges to use the host's mount() system call. Additionally, using a userspace TCP/IP server with an unmodified web browser is detailed.

The minimum NetBSD source version which supports everything described in this document is -current starting from mid-March 2011 (NetBSD 5.99.48 and later). The tutorial applies to all hardware architectures supported by NetBSD.

Important concepts and a warmup exercise

This section goes over basic concepts which help to understand how to start and use rump servers and clients.

Service location specifiers

A rump kernel service location is specified with an URL. Currently, two types of connections are supported: TCP and local domain (i.e. file system) sockets. TCP connections use standard TCP/IP addressing. The URL is of the format tcp://ip.address:port/. A local domain socket binds to a pathname on the local system. The URL format is unix://path and accepts both relative and absolute paths. Note that absolute paths require three leading slashes.

Both the client and the server require a service URL to be specified. For the server, the URL designates where the server should listen for incoming connections, and for the client it specifies which server the client should connect to.

Servers

Kernel services are provided by rump servers. Generally speaking, any driver-like kernel functionality can be offered by a rump server. Examples include file systems, networking protocols, the audio subsystem and USB hardware device drivers. A rump server is absolutely standalone and running one does not require for example the creation and maintenance of a root file system.

rump_server is a component-oriented rump kernel server. It can use any combination of available NetBSD kernel components in userspace. In its most basic mode rump server offers only bare-bones functionality such as kernel memory allocation and thread support — generally speaking nothing that is alone useful for applications. Components are dynamically loaded on the command line using a linker-like syntax. For example, for a server with FFS capability, you need VFS support and the FFS component: rump_server -lrumpvfs -lrumpfs_ffs — a bare-bones rump_server does not have file system support and will not even be able to perform open() (note: networking servers do not require VFS support). The -l option uses the host's dlopen() routine to load and link components dynamically. It is also possible to use the NetBSD kernel loader/linker to load ELF objects by supplying -m instead, but for simplicity this article always uses -l.

The URL the server listens to is supplied as the last argument on the command line. The URL follows the format described in the previous section.

Other options control things like number of virtual CPUs configured to the rump server and maximum amount of host memory the virtual kernel will allocate. They are documented in the manual page of rump_server.

Clients

Rump clients are programs which interface with the kernel servers. They can either be used to configure the server or act as consumers of the functionality provided by the server. Configuring the IP address for a TCP/IP server is an example of the former, while web browsing is an example of the latter. Clients can be considered to be the userland of a rump kernel, but unlike in a usermode operating system they are not confined to a specific file system setup, and are simply run from the hosting operating system.

A client determines the server it connects to by examining the URL in the RUMP_SERVER environment variable.

A client runs as a hybrid in both the host kernel and rump kernel. It uses essential functionality from the rump kernel, while all non-essential functionality comes from the host kernel. The direct use of the host's resources for non-essential functionality enables very lightweight services and is what sets rump apart from other forms of virtualization. The set of essential functionality depends on the application. For example, for ls fetching a directory listing with getdents() is essential functionality, while allocating the memory to which the directory contents are fetched to is non-essential.

The NetBSD base system contains applications which are preconfigured to act as rump clients. This means that just setting RUMP_SERVER will cause these applications to perform their essential functionality on the specified rump kernel server. These applications are distiguished by a "rump."-prefix in their command name. As of writing this the list of pure rump clients is:

rump.cgdconfig  rump.halt       rump.modunload  rump.raidctl    rump.traceroute
rump.dd         rump.ifconfig   rump.netstat    rump.route      
rump.dhcpclient rump.modload    rump.ping       rump.sockstat   
rump.envstat    rump.modstat    rump.powerd     rump.sysctl

Additionally, almost any other dynamically linked binary can act as a rump client, but it is up to the user to specify a correct configuration for hijacking the application's essential functionality. Hijacking is demonstrated in later sections of this document.

Client credentials and access control

The current scheme gives all connecting clients root credentials. It is recommended to take precautions which prevent unauthorized access. For a unix domain socket it is enough to prevent access to the socket using file system permissions. For TCP/IP sockets the only available means is to prevent network access to the socket with the use of firewalls. More fine-grained access control based on cryptographic credentials may be implemented at a future date.

Your First Server

Putting everything together, we're ready to start our first rump server. After startup, we examine the autogenerated hostname it was given, and halt the server. We also observe that the socket is removed when the server exits.

golem> rump_server unix://rumpserver
golem> ls -l rumpserver 
srwxr-xr-x  1 pooka  users  0 Mar 11 14:49 rumpserver
golem> sysctl kern.hostname
kern.hostname = golem.localhost
golem> export RUMP_SERVER=unix://rumpserver
golem> rump.sysctl kern.hostname
kern.hostname = rump-06341.golem.localhost.rumpdomain
golem> rump.halt
golem> rump.sysctl kern.hostname
rump.sysctl: prog init failed: No such file or directory
golem> ls -l rumpserver
ls: rumpserver: No such file or directory

As an exercise, try the above, but halt with -d to produce a core dump. Examine the core with gdb and especially look at the various thread that were running (in gdb: thread apply all bt). Also, try to create another core with kill -ABRT. Notice that you will have a stale socket in the file system when the server is violently killed. You can remove it with rm.

As a final exercise, start the server with -s. This causes the server to not detach from the console. Then kill it either with SIGTERM from another window (the default signal send by kill) or by pressing Ctrl-C. You will notice that the server reboots itself cleanly in both cases. If it had file systems, those would be unmounted too. These features are useful for quick iteration when debugging and developing kernel code.

In case you want to use a debugger to further examine later cases we go over in this tutorial, it is recommended you install debugging versions of rump components. That can be done simply by going into src/sys/rump and running make DBG=-g cleandir dependall and after that make install as root. You can also install the debugging versions to an alternate directory using make DESTDIR=/my/dir install and run the code with LD_LIBRARY_PATH set to /my/dir. This scheme also allows you to run kernel servers with non-standard code modifications on a non-privileged account.

Userspace cgd encryption

The cryptographic disk driver, cgd, provides an encrypted view of a block device. The implementation is kernel-based. This makes it convenient and efficient to layer the cryptodriver under a file system so that all file system disk access is encrypted. However, using a kernel driver requires that the code is loaded into the kernel and that a user has the appropriate privileges to configure and access the driver.

Occasionally, it is desirable to encrypt a file system image before distribution. Assume you have a USB image, i.e. one that can boot and run directly from USB media. The image can for example be something you created yourself, or even one of the standard USB installation images offered by NetBSD. You also have a directory tree with confidential data you wish to protect with cgd. This example demonstrates how to use a rump cgd server to encrypt your data. This approach, as opposed to using a driver in the host kernel, has the following properties:

uses out-of-the-box tools on any NetBSD installation
does not require any special kernel drivers
does not require superuser access
is portable to non-NetBSD systems (although requires some amount of work)

While there are multiple steps with a fair number of details, in case you plan on doing this regularly, it is possible to script them and automate the process. It is recommended that you follow these instructions as non-root to avoid accidentally overwriting a cgd partition on your host due to a mistyped command.

Let's start with the USB disk image you have. It will have a disklabel such as the following:

golem> disklabel usb.img 
# usb.img:
type: unknown
disk: USB image
label: 
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 1040
total sectors: 1048576
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0 

16 partitions:
#        size    offset     fstype [fsize bsize cpg/sgs]
 a:    981792        63     4.2BSD   1024  8192     0  # (Cyl.      0*-    974*)
 b:     66721    981855       swap                     # (Cyl.    974*-   1040*)
 c:   1048513        63     unused      0     0        # (Cyl.      0*-   1040*)
 d:   1048576         0     unused      0     0        # (Cyl.      0 -   1040*)

Our goal is to add another partition after the existing ones to contain the cgd-encrypted data. This will require extending the file on which the image resides, and naturally a large enough USB mass storage to fit the new image.

First, we create a file system image using the makefs command:

golem> makefs unencrypted.ffs preciousdir
Calculated size of `unencrypted.ffs': 12812288 bytes, 696 inodes
Extent size set to 8192
unencrypted.ffs: 12.2MB (25024 sectors) block size 8192, fragment size 1024
        using 1 cylinder groups of 12.22MB, 1564 blks, 768 inodes.
super-block backups (for fsck -b #) at:
 32,
Populating `unencrypted.ffs'
Image `unencrypted.ffs' complete

Then, we figure out the image size in disk sectors:

golem> expr `stat -f %z unencrypted.ffs` / 512
25024

We then edit the existing image label so that there is a spare partition large enough to hold the image. We need to edit "total sectors", and the "c" and "d" partition. We also need to create the "e" partition. Make sure you use "unknown" instead of "unused" as the fstype for for partition e.

golem> disklabel -re usb.img 
# usb.img:
type: unknown
disk: USB image
label: 
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 1040
total sectors: 1073600
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0 

16 partitions:
#        size    offset     fstype [fsize bsize cpg/sgs]
 a:    981792        63     4.2BSD   1024  8192     0  # (Cyl.      0*-    974*)
 b:     66721    981855       swap                     # (Cyl.    974*-   1040*)
 c:   1073537        63     unused      0     0        # (Cyl.      0*-   1065*)
 d:   1073600         0     unused      0     0        # (Cyl.      0 -   1065*)
 e:     25024   1048576    unknown      0     0        # (Cyl.   1040*-   1065*)

Now, it is time to start a rump server for writing the encrypted data to the image. We need to note that a rump kernel has a local file system namespace and therefore cannot in its natural state see files on the host. However, the -d parameter to rump_server can be used to map files from the host into the rump kernel file system namespace. We start the server in the following manner:

golem> export RUMP_SERVER=unix:///tmp/cgdserv
golem> rump_server -lrumpvfs -lrumpkern_crypto -lrumpdev -lrumpdev_disk	\
    -lrumpdev_cgd -d key=/dk,hostpath=usb.img,disklabel=e ${RUMP_SERVER}

This maps partition "e" from the disklabel on usb.img to the key /dk inside the rump kernel. In other words, accessing sector 0 from /dk in the rump kernel namespace will access sector 1048576 on usb.img. The image file is also automatically extended so that the size is large enough to contain the entire partition.

Note that everyone who has access to the server socket will have root access to the kernel server, and hence the data you are going to encrypt. In case you are following these instructions on a multiuser server, it is a good idea to make sure the socket is in a directory only you have access to (directory mode 0700).

We can now verify that we get a zero-filled partition of the right size:

golem> rump.dd if=/dk bs=64k > emptypart
195+1 records in
195+1 records out
12812288 bytes transferred in 0.733 secs (17479246 bytes/sec)
golem> hexdump -x emptypart 
0000000    0000    0000    0000    0000    0000    0000    0000    0000
*
0c38000

In the above example we could pipe rump.dd output directly to hexdump. However, running two separate commands also conveniently demonstrates that we get the right amount of data from /dk.

If we were to dd our unencrypted.img to /dk, we would have added a regular unencrypted partition to the image. The next step is to configure a cgd so that we can write encrypted data to the partition. In this example we'll use a password-based key, but you are free to use anything that is supported by cgdconfig.

golem> rump.cgdconfig -g aes-cbc > usb.cgdparams
golem> cat usb.cgdparams 
algorithm aes-cbc;
iv-method encblkno1;
keylength 128;
verify_method none;
keygen pkcs5_pbkdf2/sha1 {
        iterations 325176;
        salt AAAAgGc4DWwqXN4t0eapskSLWTs=;
};

Note that if you have a fast machine and wish to use the resulting encrypted partition on slower machines, it is a good idea to edit "iterations". The value is automatically calibrated by cgdconfig so that encryption key generation takes about one second on the platform the params file is generated with. This can take significantly longer on slower systems. (More information about the iteration count is available here.)

The next step is to configure the cgd device using the paramsfile. Since we are using password-based encryption we will be prompted for a password. Enter any password you want to use to access the data later.

golem> rump.cgdconfig cgd0 /dk usb.cgdparams
/dk's passphrase:

If we repeat the dd test in the encrypted partition we will get a very different result than above. This is expected, since now we have an encrypted view of the zero-filled partition.

golem> rump.dd if=/dev/rcgd0d | hexdump -x | sed 8q
0000000    9937    5f33    25e7    c341    3b67    c411    9d73    645c
0000010    5b7c    23f9    b694    e732    ce0a    08e0    9037    2b2a
*
0000200    0862    ee8c    eafe    b21b    c5a3    4381    cdb5    2033
0000210    5b7c    23f9    b694    e732    ce0a    08e0    9037    2b2a
*
0000400    ef06    099d    328d    a35d    f4ab    aac0    6aba    d673
0000410    5b7c    23f9    b694    e732    ce0a    08e0    9037    2b2a

NOTE: The normal rules for the raw device names apply, and the correct device path is /dev/rcgd0c on non-x86 archs.

To encrypt our image, we simply need to dd it to the cgd partition.

golem> dd if=unencrypted.ffs bs=64k | rump.dd of=/dev/rcgd0d bs=64k
195+1 records in
195+1 records out
12812288 bytes transferred in 0.890 secs (14395829 bytes/sec)
195+1 records in
195+1 records out
12812288 bytes transferred in 0.896 secs (14299428 bytes/sec)

We have now successfully written an encrypted version of the file system to the image file and can proceed to shut down the rump server. This makes sure all rump kernel caches are flushed.

golem> rump.halt
golem> unset RUMP_SERVER

You will need to make sure the cgd params file is available on the platform you intend to use the image on. There are multiple ways to do this. It is safe even to offer the params file for download with the image — just make sure the password is not available for download. Notably, though, you will be telling everyone how the image was encrypted and therefore lose the benefit of two-factor authentication.

In this example we use fs-utils (the latest version is available from othersrc) to copy the file to the unencrypted "a" partition. Like other utilities in this tutorial, fs-utils works purely in userspace and does not require special privileges or kernel support.

golem> fsu_put usb.img%DISKLABEL:a% usb.cgdparams root/
golem> fsu_ls usb.img%DISKLABEL:a% -l root/usb.cgdparams 
-rw-r--r--  1 pooka  users  175 Feb  9 17:50 root/usb.cgdparams
golem> fsu_chown usb.img%DISKLABEL:a% 0:0 root/usb.cgdparams
golem> fsu_ls usb.img%DISKLABEL:a% -l root/usb.cgdparams 
-rw-r--r--  1 root  wheel  175 Feb  9 17:50 root/usb.cgdparams

Alternatively, we could use the method described later in this document which works purely with base system utilities.

We are ready to copy the image to a USB stick. This step should be executed with appropriate privileges for raw writes to USB media. If USB access is not possible on the same machine, the image may be copied over network to a suitable machine.

golem# dd if=usb.img of=/dev/rsd0d bs=64k
8387+1 records in
8387+1 records out
549683200 bytes transferred in 122.461 secs (4488638 bytes/sec)

Finally, we can boot the target machine from the USB stick, configure the encrypted partition, mount the file system, and access the data. Note that to perform these operations we need root privileges on the target machine, since we are using the in-kernel drivers.

demogorgon# cgdconfig cgd0 /dev/sd0e /root/usb.cgdparams
/dev/sd0e's passphrase:
demogorgon# disklabel cgd0
# /dev/rcgd0d:
type: cgd
disk: cgd
label: fictitious
flags:
bytes/sector: 512
sectors/track: 2048
tracks/cylinder: 1
sectors/cylinder: 2048
cylinders: 12
total sectors: 25024
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0 

4 partitions:
#        size    offset     fstype [fsize bsize cpg/sgs]
 a:     25024         0     4.2BSD      0     0     0  # (Cyl.      0 -     12*)
 d:     25024         0     unused      0     0        # (Cyl.      0 -     12*)
disklabel: boot block size 0
disklabel: super block size 0
demogorgon# mount /dev/cgd0a /mnt
demogorgon#

On a real hardware platform the result looks like this. Happy cgd'ing!

Networking

This section explains how to run any dynamically linked networking program against a rump TCP/IP stack without requiring any modifications to the application, including no recompilation. The application we use in this example is the Firefox browser. It is an interesting application for multiple reasons. Segregating the web browser to its own TCP/IP stack is an easy way to increase monitoring and control over what kind of connections the web browser makes. It is also an easy way to get some increased privacy protection (assuming the additional TCP/IP stack can have its own external IP). Finally, a web browser is largely "connectionless", meaning that once a page has been loaded a TCP/IP connection can be discarded. We use this property to demonstrate killing and restarting the TCP/IP stack from under the application.

A rump server with TCP/IP capability is required. If the plan is to access the internet, the virt interface must be present in the rump kernel and the host kernel must have support for tap and bridge. You also must have the appropriate privileges for configuring the setup — while rump kernels do not themselves require privileges, they cannot magically access host resources without the appropriate privileges. If you do not want to access the internet, using the shmif interface is enough and no privileges are required. However, for purposes of this tutorial we will assume you want to access the internet.

Finally, if there is a desire to configure the rump TCP/IP stack with DHCP, the rump kernel must support bpf. Since bpf is accessed via a file system device node, vfs support is required in this case (without bpf there is no need for file system support). Putting everything together, the rump kernel command line looks like this:

rump_server -lrumpnet -lrumpnet_net -lrumpnet_netinet	# TCP/IP networking
    -lrumpvfs -lrumpdev -lrumpdev_bpf			# bpf support
    -lrumpnet_virtif					# virt(4)

So, to start the TCP/IP server execute the following. Make sure RUMP_SERVER stays set in the shell you want to use to access the rump kernel.

golem> export RUMP_SERVER=unix:///tmp/netsrv
golem> rump_server -lrumpnet -lrumpnet_net -lrumpnet_netinet -lrumpvfs
    -lrumpdev -lrumpdev_bpf -lrumpnet_virtif ${RUMP_SERVER}

The TCP/IP server is now running and waiting for clients at RUMP_SERVER. For applications to be able to use it, we must do what we do to a regular host kernel TCP/IP stack: configure it. This is discussed in the next section.

Configuring the TCP/IP stack

A kernel mode TCP/IP stack typically has access to networking hardware for sending and receiving packets, so first we must make sure the rump TCP/IP server has the same capability. The canonical way is to use bridging and we will present that here. An alternative is to use the host kernel to route the packets, but that is left as an exercise to the reader. In both cases, the rump kernel sends and receives external packets via a /dev/tap<n> device node. The rump kernel must have read-write access to this device node. The details are up to you, but the recommended way is to use appropriate group privileges.

To create a tap interface and attach it via bridge to a host Ethernet interface we execute the following commands. You can attach as many tap interfaces to a single bridge as you like. For example, if you run multiple rump kernels on the same machine, adding all the respective tap interfaces on the same bridge will allow the different kernels to see each others' Ethernet traffic.

Note that the actual interface names will vary depending on your system and which tap interfaces are already in use.

golem# ifconfig tap0 create
golem# ifconfig tap0 up
golem# ifconfig tap0
tap0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
        address: f2:0b:a4:f1:da:00
        media: Ethernet autoselect
golem# ifconfig bridge0 create
golem# brconfig bridge0 add tap0 add re0
golem# brconfig bridge0 up
golem# brconfig bridge0
bridge0: flags=41<UP,RUNNING>
        Configuration:
                priority 32768 hellotime 2 fwddelay 15 maxage 20
                ipfilter disabled flags 0x0
        Interfaces:
                re0 flags=3<LEARNING,DISCOVER>
                        port 2 priority 128
                tap0 flags=3<LEARNING,DISCOVER>
                        port 4 priority 128
        Address cache (max cache: 100, timeout: 1200):
                b2:0a:53:0b:0e:00 tap0 525 flags=0<>
                go:le:ms:re:0m:ac re0 341 flags=0<>

That takes care of support on the host side. The next task is to create an interface within the rump kernel which uses the tap interface we just created. In case you are not using tap0, you need to know that virt<n> always corresponds to the host's tap<n>.

golem> rump.ifconfig virt0 create
golem> rump.ifconfig virt0
virt0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
        address: b2:0a:bb:0b:0e:00

In case you do not have permission to open the corresponding tap device on the host, or the host's tap interface has not been created, you will get an error from ifconfig when trying to create the virt interface.

Ok, so the rump kernel interface exists. The final step is to configure an address and routing. In case there is DHCP support on the network you bridged the rump kernel to, you can simply run rump.dhcpclient:

golem> rump.dhcpclient virt0
virt0: adding IP address 192.168.2.125/24
virt0: adding route to 192.168.2.0/24
virt0: adding default route via 192.168.2.1
lease time: 172800 seconds (2.00 days)

If there is no DHCP service available, you can do the same manually with the same result.

golem> rump.ifconfig virt0 inet 192.168.2.125 netmask 0xffffff00
golem> rump.route add default 192.168.2.1
add net default: gateway 192.168.2.1

You should now have network access via the rump kernel. You can verify this with a simple ping.

golem> rump.ping www.NetBSD.org
PING www.NetBSD.org (204.152.190.12): 56 data bytes
64 bytes from 204.152.190.12: icmp_seq=0 ttl=250 time=169.102 ms
64 bytes from 204.152.190.12: icmp_seq=1 ttl=250 time=169.279 ms
64 bytes from 204.152.190.12: icmp_seq=2 ttl=250 time=169.633 ms
^C
----www.NetBSD.org PING Statistics----
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 169.102/169.338/169.633/0.270 ms

In case everything is working fine, you will see the same latency as with the host networking stack.

golem> ping www.NetBSD.org
PING www.NetBSD.org (204.152.190.12): 56 data bytes
64 bytes from 204.152.190.12: icmp_seq=0 ttl=250 time=169.134 ms
64 bytes from 204.152.190.12: icmp_seq=1 ttl=250 time=169.281 ms
64 bytes from 204.152.190.12: icmp_seq=2 ttl=250 time=169.497 ms
^C
----www.NetBSD.org PING Statistics----
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 169.134/169.304/169.497/0.183 ms

Running applications

We are now able to run arbitrary unmodified applications using the TCP/IP stack provided by the rump kernel. We just have to set LD_PRELOAD to instruct the dynamic linker to load the rump hijacking library. Also, now is a good time to make sure RUMP_SERVER is still set and points to the right place.

golem> export LD_PRELOAD=/usr/lib/librumphijack.so

Congratulations, that's it. Any application you run from the shell in which you set the variables will use the rump TCP/IP stack. If you wish to use another rump TCP/IP server (which has networking configured), simply adjust RUMP_SERVER. Using this method you can for example segregate some "evil" applications to their own networking stack.

Transparent TCP/IP stack restarts

Since the TCP/IP stack is running in a separate process from the client, it is possible kill and restart the TCP/IP stack from under the application without having to restart the application. Potential application for this are to take features available in later releases into use or fixing a security vulnerability. Even though NetBSD kernel code barely ever crashes, it does happen, and this will also protect against that.

Since networking stack code does not contain any checkpointing support, killing the hosting process will cause all kernel state to go missing and for example previously used sockets will not be available after restart. Even if checkpointing were added for things like file descriptors, generally speaking checkpointing a TCP connection is not possible. The reaction to this unexpected loss of state largely depends on the application. For example, ssh will not handle this well, but Firefox will generally speaking recover without adverse effects.

Before starting the hijacked application you should instruct the rump client library to retry to connect to the server in case the connection is lost. This is done by setting RUMPHIJACK_RETRYCONNECT to a value documented on the manual page.

golem> export RUMPHIJACK_RETRYCONNECT=inftime
golem> firefox

Now we can use Firefox just like we would with the host kernel networking stack. When we want to restart the TCP/IP stack, we can use any method we'd like for killing the TCP/IP server, even kill -9 or just having it panic. The client will detect the severed connection and print out the following diagnostic warnings.

rump_sp: connection to kernel lost, trying to reconnect ...
rump_sp: still trying to reconnect ...
rump_sp: still trying to reconnect ...

Once the server has been restarted, the following message will be printed. If the server downtime was long, the client can take up to 10 seconds to retry, so do not be surprised if you do not see it immediately.

rump_sp: reconnected!

Note that this message only signals that the client has a connection to the server. In case the server has not been configured yet to have an IP address and a gateway, the application will not be able to function regularly. However, when that step is complete, normal service can resume.

Any pages that were loading when the TCP/IP server went down will not finish loading. However, this can be "fixed" simply by reloading the pages.

Emulating makefs

The makefs command takes a directory tree and creates a file system image out of it. This groundbreaking utility was developed back when crossbuild capability was added to the NetBSD source tree. Since makefs constructs the file system purely in userspace, it does not depend on the buildhost kernel to have file system support or the build process to have privileges to mount a file system. However, its implementation requires many one-way modifications to the kernel file system driver. Since performing these modifications is complicated, out of the NetBSD kernel file systems with r/w support makefs supports only FFS.

This part of the tutorial will show how to accomplish the same with out-of-the-box binaries. It applies to any r/w kernel file system for which NetBSD ships a newfs utility capable of creating image files. We learn how to mount a file system within the hijacked rump kernel namespace and how to use pax to copy files to the file system image.

First, we need a suitable victim directory tree we want to create an image out of. We will again use the nethack source tree as an example. We need to find out how much space the directory tree will require.

golem> du -sh nethack-3.4.3/
12M     nethack-3.4.3/

Next, we need to create an empty file system. We use the standard newfs tool for this (command name will vary depending on target file system type). Since the file system must also accommodate metadata such as inodes and directory entries, we will create a slightly larger file system than what was indicated by du and reserve roughly 10% more disk space. There are ways to increase the accuracy of this calculation, but they are beyond the scope of this document.

golem> newfs -F -s 14M nethack.img
nethack.img: 14.0MB (28672 sectors) block size 4096, fragment size 512
        using 4 cylinder groups of 3.50MB, 896 blks, 1696 inodes.
super-block backups (for fsck_ffs -b #) at:
32, 7200, 14368, 21536,

Now, we need to start a rump server capable of mounting this particular file system type. As in the cgd example, we map the host image as /dk in the rump kernel namespace.

golem> rump_server -lrumpvfs -lrumpfs_ffs
    -d key=/dk,hostpath=nethack.img,size=host unix:///tmp/ffs_server

Next, we need to configure our shell for rump syscall hijacking. This is done by pointing the LD_PRELOAD environment variable to the hijack library. Every command executed with the variable set will attempt to contact the rump server and will fail if the server cannot contacted. This is demonstrated below by first omitting RUMP_SERVER and attempting to run a command. Shell builtins such as export and unset can still be run, since they do not start a new process.

golem> export LD_PRELOAD=/usr/lib/librumphijack.so
golem> lua -e 'print("Hello, rump!")'
lua: rumpclient init: No such file or directory
golem> export RUMP_SERVER=unix:///tmp/ffs_server
golem> lua -e 'print("Hello, rump!")'
Hello, rump!

Now, we can access the rump kernel file system namespace using the special path prefix /rump.

golem> ls -l /rump
total 1
drwxr-xr-x  2 root  wheel  512 Mar 12 13:31 dev

By default, a rump root file system includes only some autogenerated device nodes based on which components are loaded. As an experiment, you can try the above also against a server which does not support VFS.

We then proceed to create a mountpoint and mount the file system. Note, we start a new shell here because the one where we set LD_PRELOAD in was not executed with the variable set. That process does not have hijacking configured and we cannot cd into /rump. There is no reason we could not perform everything without changing the current working directory, but doing so often means less typing.

golem> $SHELL
golem> cd /rump
golem> mkdir mnt
golem> df -i mnt
Filesystem   1K-blocks       Used      Avail %Cap    iUsed   iAvail %iCap Mounted on
rumpfs               1          1          0 100%        0        0    0% /
golem> mount_ffs /dk /rump/mnt
mount_ffs: Warning: realpath /dk: No such file or directory
golem> df -i mnt
Filesystem   1K-blocks       Used      Avail %Cap    iUsed   iAvail %iCap Mounted on
/dk              13423          0      12752   0%        1     6781    0% /mnt

Note that the realpath warning from mount_ffs is only a warning and can be ignored. It is a result of the userland utility trying to find the source device /dk, but cannot since it is available only inside the rump kernel. Note that you need to supply the full path for the mountpoint, i.e. /rump/mnt instead of mnt. Otherwise the userland mount utility may adjust it incorrectly.

If you run the mount command you will note that the mounted file system is not present. This is expected, since the file system has been mounted within the rump kernel and not the host kernel, and therefore the host kernel does not know anything about it. The list of mounted file system is fetched with the getvfsstat() system call. Since the system call does not take any pathname, the hijacking library cannot automatically determine if the user wanted the mountpoints from the host kernel or the rump kernel. However, it is possible for the user to configure the behaviour by setting the RUMPHIJACK environment variable to contain the string vfs=getvfsstat.

golem> env RUMPHIJACK=vfs=getvfsstat mount               
rumpfs on / type rumpfs (local)
/dk on /mnt type ffs (local)

Other ways of configuring the behaviour of system call hijacking are described on the manual page. Note that setting the variable will override the default behaviour, including the ability to access /rump. You can restore this by setting the variable to vfs=getvfsstat,path=/rump. Like with LD_PRELOAD, setting the variable will affect only processes you run after setting it, and the behaviour of the shell it was set in will remain unchanged.

Now we can copy the files over. Due to how pax works, we first change our working directory to avoid encoding the full source path in the destination. The alternative is use us the -s option, but I find that changing the directory is often simpler.

golem> cd ~/srcdir
golem> pax -rw nethack-3.4.3 /rump/mnt/
golem> df -i /rump/mnt/
Filesystem   1K-blocks       Used      Avail %Cap    iUsed   iAvail %iCap Mounted on
/dk              13423      11962        790  93%      695     6087   10% /mnt

For comparison, we present the same operation using cp. Obviously, only one of pax or cp is necessary and you can use whichever you find more convenient.

golem> cp -Rp ~/srcdir/nethack-3.4.3 mnt/
golem> df -i /rump/mnt/
Filesystem   1K-blocks       Used      Avail %Cap    iUsed   iAvail %iCap Mounted on
/dk              13423      11962        790  93%      695     6087   10% /mnt

Then, the only thing left is to unmount the file system to make sure that we have a clean file system image.

golem> umount -R /rump/mnt
golem> df -i /rump/mnt
Filesystem   1K-blocks       Used      Avail %Cap    iUsed   iAvail %iCap Mounted on
rumpfs               1          1          0 100%        0        0    0% /

It is necessary to give the -R option to umount, or it will attempt to adjust the path by itself. This will usually result in the wrong path and the unmount operation failing. It is possible to set RUMPHIJACK in a way which does not require using -R, but that is left as an exercise for the reader.

We do not need to remove the mountpoint since the rump root file system is an in-memory file system and will be removed automatically when we halt the server.

Congratulations, you now have a clean file system image containing the desired files.

Master class: NFS server

This section presents scripts which allow to start a rump kernel capable of serving NFS file systems and how to mount the service using a client connected to another kernel server. At this stage all the relevant pointers to manual pages have been given, so the scripts can merely be presented instead of being explained thoroughly.

NFS Server

#!/bin/sh
#
# This script starts a rump kernel with NFS serving capability,
# configures a network interface and starts hijacked binaries
# which are necessary to serve NFS (rpcbind, mountd, nfsd).
#

# directory used for all temporary stuff
NFSX=/tmp/nfsx

# no need to edit below this line

haltserv()
{
	RUMP_SERVER=unix://${NFSX}/nfsserv rump.halt 2> /dev/null
	RUMP_SERVER=unix://${NFSX}/nfscli rump.halt 2> /dev/null
}

die()
{
	haltserv
	echo $*
	exit 1
}

# start from a fresh table
haltserv
rm -rf ${NFSX}
mkdir ${NFSX} || die cannot mkdir ${NFSX}

# create ffs file system we'll be exporting
newfs -F -s 10000 ${NFSX}/ffs.img > /dev/null || die could not create ffs

# start nfs kernel server.  this is a mouthful
export RUMP_SERVER=unix://${NFSX}/nfsserv
rump_server -lrumpvfs -lrumpdev -lrumpnet				\
    -lrumpnet_net -lrumpnet_netinet -lrumpnet_local -lrumpnet_shmif	\
    -lrumpdev_disk -lrumpfs_ffs -lrumpfs_nfs -lrumpfs_nfsserver 	\
    -d key=/dk,hostpath=${NFSX}/ffs.img,size=host ${RUMP_SERVER}
[ $? -eq 0 ] || die rump server startup failed

# configure server networking
rump.ifconfig shmif0 create
rump.ifconfig shmif0 linkstr ${NFSX}/shmbus
rump.ifconfig shmif0 inet 10.1.1.1

# especially rpcbind has a nasty habit of looping
export RUMPHIJACK_RETRYCONNECT=die
export LD_PRELOAD=/usr/lib/librumphijack.so

# "mtree"
mkdir -p /rump/var/run
mkdir -p /rump/var/db
touch /rump/var/db/mountdtab
mkdir /rump/etc
mkdir /rump/export

# create /etc/exports
echo '/export -noresvport -noresvmnt -maproot=0:0 10.1.1.100' |		\
    dd of=/rump/etc/exports 2> /dev/null

# mount our file system
mount_ffs /dk /rump/export 2> /dev/null || die mount failed
touch /rump/export/its_alive

# start rpcbind.  we want /var/run/rpcbind.sock
RUMPHIJACK='blanket=/var/run,socket=all' rpcbind || die rpcbind start

# ok, then we want mountd in the similar fashion
RUMPHIJACK='blanket=/var/run:/var/db:/export,socket=all,path=/rump,vfs=all' \
    mountd /rump/etc/exports || die mountd start

# finally, it's time for the infamous nfsd to hit the stage
RUMPHIJACK='blanket=/var/run,socket=all,vfs=all' nfsd -tu

NFS Client

#!/bin/sh
#
# This script starts a rump kernel which contains the drivers necessary
# to mount an NFS export.  It then proceeds to mount and provides
# a directory listing of the mountpoint.
#

NFSX=/tmp/nfsx

export RUMP_SERVER=unix://${NFSX}/nfscli
rump.halt 2> /dev/null
rump_server -lrumpvfs -lrumpnet -lrumpnet_net -lrumpnet_netinet		\
    -lrumpnet_shmif -lrumpfs_nfs ${RUMP_SERVER}

rump.ifconfig shmif0 create
rump.ifconfig shmif0 linkstr ${NFSX}/shmbus
rump.ifconfig shmif0 inet 10.1.1.100

export LD_PRELOAD=/usr/lib/librumphijack.so

mkdir /rump/mnt

mount_nfs 10.1.1.1:/export /rump/mnt

echo export RUMP_SERVER=unix://${NFSX}/nfscli
echo export LD_PRELOAD=/usr/lib/librumphijack.so

Using it

To use the NFS server, just run both scripts. The client script will print configuration data, so you can eval the script's output in a bourne type shell for the correct configuration.

golem> sh rumpnfsd.sh
golem> eval `sh rumpnfsclient.sh`

That's it. You can start a shell and access the NFS client as normal.

golem> df /rump/mnt
Filesystem       1K-blocks       Used      Avail %Cap Mounted on
10.1.1.1:/export       4631          0       4399   0% /mnt
golem> sh
golem> cd /rump
golem> jot 100000 > mnt/numbers
golem> df mnt
Filesystem       1K-blocks       Used      Avail %Cap Mounted on
10.1.1.1:/export       4631        580       3819  13% /mnt

When you're done, stop the servers in the normal fashion. You may also want to remove the /tmp/nfsx temporary directory.

Further ideas

Kernel code development and debugging was a huge personal motivation for working on this, and is a truly excellent use case especially if you want to safely and easily learn about how various parts of the kernel work.

There are also more user-oriented applications. For example, you can construct servers which run hardware drivers from some later release of NetBSD than what is running on your host. You can also distribute these devices as services on the network.

On a multiuser machine where you do not have control over how your data is backed up you can use a cgd server to provide a file system with better confidentiality guarantees than your regular home directory. You can easily configure your applications to communicate directly with the cryptographic server, and confidential data will never hit the disk unencrypted. This, of course, does not protect against all threat models on a multiuser system, but is a simple way of protecting yourself against one of them.

Furthermore, you have more finegrained control over privileges. For example, opening a raw socket requires root privileges. This is still true for a rump server, but the difference is that it requires root privileges in the rump kernel, not the host kernel. Now, if rump server runs with normal user privileges (as is recommended), you cannot use rump kernel root privileges for full control of the hosting OS.

In the end, this document only scratched the surface of what is possible by running kernel code as services in userspace.

Acknowledgements

Jeff Rizzo gave comments on a draft and pointed out some problems with various commands. Roland C. Dowdeswell commented on the section about cgd.