Capfs: Capability file-system (1) Introduction Capfs is a little experiment in usable security interface design. Capfs has three technical goals: (a) Allow an administrator to declare a different initial allocation of capabilities and control them during runtime, (b) Allow developers to "bracket" the use of such capabilities, gaining them only when necessary and dropping them otherwise, in an easy-to-use, language-agnostic interface, and (c) Be as least intrusive as possible, requiring close to zero userland adaptation. The way capfs attempts to achieve these goals is by providing a pseudo file-system similar to procfs that is mounted together with a mapping between users and capabilities. Gaining and dropping capabilities is accomplished by using simple file operations like open, close, remove, etc., allowing interaction using existing and familiar tools for administrators and an intuitive learning process for developers. Section 2 of this document describes some basic concepts. Usage examples for programmers and administrators are in sections 3 and 4, respectively. Section 5 details how to setup one or more (e.g. for chrooted programs) capfs instances. Finally there are some implementation notes (section 6) and possible directions for future development (section 7). Code implementing capfs can be downloaded from: http://www.NetBSD.org/~elad/capfs/capfs.tar.gz This document can be found at: http://www.NetBSD.org/~elad/capfs/capfs.txt (2) Basic concepts The capfs model gives administrators the opportunity to create a different allocation of privileges. Instead of the traditional Unix "super-user" concept where it is implicitly defined that an effective user-id zero receives all privileges and all other users receive none, capfs allows specifying the initial set associated with each user in a configuration file. Under capfs, every process starts with two sets of capabilities: (a) Permitted capabilities: indicating what capabilities the process can request for. Having a capability permitted does not imply it is in use. Ideally, a process will begin by permanently removing capabilities from this list it will never need. (b) Effective capabilities: the list of currently effective and in-use capabilities. Ideally, this list should be kept as short as possible and capabilities made effective on a need basis. The hierarchy is /cap//{permitted,effective}/, where is the process-id or "self" if referencing the current process, "permitted" or "effective" reference the relevant set as above, and is the capability name. For example, if a process with PID 1234 has the "raw_socket" capability (indicating it's allowed to open a raw socket) permitted, it will appear as /cap/1234/permitted/raw_socket. Once the process gains it by opening this pseudo file, it will also appear as /cap/1234/effective/raw_socket. (3) Usage: Programmers Using capfs inside programs requires no additional linking or new APIs. All the programmer has to know is (a) where capfs is mounted (e.g. /cap) and (b) how each capability is called. A "permitted" capability is made "effective" with open(2) and is removed from the effective set with close(2) or unlink(2). For example, assuming capfs is mounted on /cap, in a C program the programmer can do the following: /* Get raw_socket capability. */ cap_fd = open("/cap/self/permitted/raw_socket", 0); if (cap_fd == -1) err(EXIT_FAILURE, "unable to get raw_socket capability"); /* Open raw socket. */ sock = socket(AF_INET, SOCK_RAW, IPPROTO_TCP); /* Drop the raw_socket capability. */ close(cap_fd); Effective capabilities can also be removed by using unlink(2): /* Drop the raw_socket capability. */ unlink("/cap/self/effective/raw_socket"); (This may seem like duplicate functionality, but is actually required to support traditional behavior where a process begins with capabilities without having to request for them, and thus does not have a file descriptor to close(2).) If the program will never need the "raw_socket" capability again, the programmer can permanently remove it: /* Permanently remove raw_socket capability. */ unlink("/cap/self/permitted/raw_socket"); At any point the programmer can check for permitted or effective capabilities: /* Regain raw_socket capability if we don't have it. */ rv = stat("/cap/self/effective/raw_socket", &sb); if (rv == ENOENT) { /* Is it even permitted? */ rv = stat("/cap/self/permitted/raw_socket", &sb); if (rv == ENOENT) err(EXIT_FAILURE, "raw_socket capability not permitted"); } Capabilities can be used in practically every language. For example, Python: whatever$ python2.6 Python 2.6.6 (r266:84292, Apr 4 2011, 12:12:56) [GCC 4.1.3 20080704 prerelease (NetBSD nb2 20081120)] on netbsd5 Type "help", "copyright", "credits" or "license" for more information. >>> from socket import * >>> sock = socket(AF_INET, SOCK_RAW, IPPROTO_TCP) Traceback (most recent call last): File "", line 1, in File "/usr/pkg/lib/python2.6/socket.py", line 184, in __init__ _sock = _realsocket(family, type, proto) socket.error: [Errno 1] Operation not permitted >>> cap_fd = open("/cap/self/permitted/raw_socket") >>> sock = socket(AF_INET, SOCK_RAW, IPPROTO_TCP) >>> cap_fd.close() >>> sock = socket(AF_INET, SOCK_RAW, IPPROTO_TCP) Traceback (most recent call last): File "", line 1, in File "/usr/pkg/lib/python2.6/socket.py", line 184, in __init__ _sock = _realsocket(family, type, proto) socket.error: [Errno 1] Operation not permitted >>> Or Lua: whatever$ lua Lua 5.1.4 Copyright (C) 1994-2008 Lua.org, PUC-Rio > os.execute("date -n 0420") date: settimeofday: Operation not permitted > cap_fd = io.open("/cap/self/permitted/change_time") > os.execute("date -n 0420") Sun Jul 10 04:20:00 IDT 2011 > io.close(cap_fd) > os.execute("date -n 0420") date: settimeofday: Operation not permitted > (Note that since there are no dependencies on external libraries, bracketing code can be always compiled in and the presence of a capfs determined during run-time rather than compile-time, making it semi-portable.) (4) Usage: Administrators Capfs also allows administrators to get a quick view of which program is allowed to do what at any given moment. Traditional tools can be used to filter and process the output. An administrator can also manipulate capabilities of running processes. For example, list processes currently allowed to change time: whatever# for pid in `ls -1 /cap/[0-9]*/effective/change_time | cut -d/ -f3` > do ps -up $pid | sed -e '1d' > done elad 10758 0.0 0.6 10016 3300 ? I 5:24PM 0:02.26 sshd: elad@pts/1 (sshd) elad 13594 0.0 0.3 3428 1508 ttyp1 I+ 4:23AM 0:00.05 lua root 13695 0.0 0.2 2968 1200 ttyp3 S 12:11PM 0:00.49 ksh elad 15464 0.0 0.3 3176 1416 ttyp1 Is 5:24PM 0:00.14 -sh root 567 0.0 0.5 6948 2388 ? Is Fri02AM 1:47.41 /usr/libexec/postfix/master root 9441 0.0 0.3 3176 1424 ttyp3 I 4:40AM 0:00.46 sh whatever# List permitted capabilities for a process: whatever# ls /cap/13594/permitted bind_privport change_time raw_socket whatever# Controlling capabilities externally is also simple. Below is an annotated example, showing how an administrator can change the capabilities of a running program (in this case Python) using basic tools like rm and touch: [ Start Python normally and try to open a raw socket. ] whatever$ python2.6 Python 2.6.6 (r266:84292, Apr 4 2011, 12:12:56) [GCC 4.1.3 20080704 prerelease (NetBSD nb2 20081120)] on netbsd5 Type "help", "copyright", "credits" or "license" for more information. >>> from socket import * >>> sock = socket(AF_INET, SOCK_RAW, IPPROTO_TCP) Traceback (most recent call last): File "", line 1, in File "/usr/pkg/lib/python2.6/socket.py", line 184, in __init__ _sock = _realsocket(family, type, proto) socket.error: [Errno 1] Operation not permitted >>> [ We get EPERM, so we open the permitted capability and try again: ] >>> cap_fd = open("/cap/self/permitted/raw_socket") >>> sock = socket(AF_INET, SOCK_RAW, IPPROTO_TCP) >>> [ On a different terminal, a root user does the following: ] whatever# ps -au | grep python elad 9041 0.0 0.7 7668 3760 ttyp1 I+ 6:27PM 0:00.09 python2.6 whatever# ls /cap/9041/effective raw_socket whatever# rm /cap/9041/effective/raw_socket whatever# [ This removed the raw socket capability from the effective set. We verify back on the first terminal: ] >>> sock = socket(AF_INET, SOCK_RAW, IPPROTO_TCP) Traceback (most recent call last): File "", line 1, in File "/usr/pkg/lib/python2.6/socket.py", line 184, in __init__ _sock = _realsocket(family, type, proto) socket.error: [Errno 1] Operation not permitted >>> [ Because it's still in the permitted set though, we can gain it back: ] >>> cap_fd2 = open("/cap/self/permitted/raw_socket") >>> sock = socket(AF_INET, SOCK_RAW, IPPROTO_TCP) >>> [ However, if it is removed permanently, we cannot. On a different terminal: ] whatever# rm /cap/9041/effective/raw_socket whatever# rm /cap/9041/permitted/raw_socket whatever# [ And back in the Python terminal, we no longer have the raw socket capability: ] >>> sock = socket(AF_INET, SOCK_RAW, IPPROTO_TCP) Traceback (most recent call last): File "", line 1, in File "/usr/pkg/lib/python2.6/socket.py", line 184, in __init__ _sock = _realsocket(family, type, proto) socket.error: [Errno 1] Operation not permitted >>> cap_fd3 = open("/cap/self/permitted/raw_socket") Traceback (most recent call last): File "", line 1, in IOError: [Errno 2] No such file or directory: '/cap/self/permitted/raw_socket' >>> [ At this point, a root user can grant capabilities back. For example, the raw socket capability can be added to the permitted set: ] whatever# touch /cap/9041/permitted/raw_socket whatever# [ Making it possible for us to open a raw socket: ] >>> cap_fd3 = open("/cap/self/permitted/raw_socket") >>> sock = socket(AF_INET, SOCK_RAW, IPPROTO_TCP) >>> [ Capabilities can also be added directly to the effective set. For example, we first drop the raw socket capability: ] >>> cap_fd3.close() >>> sock = socket(AF_INET, SOCK_RAW, IPPROTO_TCP) Traceback (most recent call last): File "", line 1, in File "/usr/pkg/lib/python2.6/socket.py", line 184, in __init__ _sock = _realsocket(family, type, proto) socket.error: [Errno 1] Operation not permitted >>> [ Then it's added externally: ] whatever# touch /cap/9041/effective/raw_socket whatever# [ And we can use it again without having to open it: ] >>> sock = socket(AF_INET, SOCK_RAW, IPPROTO_TCP) >>> Other tools can be written as scripts (or programs) and don't require modifications to any existing code. For example, a simple Python script that monitors changes to process capabilities is available from: http://www.NetBSD.org/~elad/capfs/capwatch.py Sample output while running Python in another terminal, gaining and dropping the raw_socket capability and exiting: whatever$ ./capwatch.py /cap 14874|new|raw_socket,bind_privport,change_time| 14874|cap|effective|add|raw_socket 14874|cap|effective|remove|raw_socket 14874|gone ^C whatever$ (Keep in mind that a proper monitor would receive notifications, but kqueue(2) seems inadequate.) (5) Setup Setting up capfs is simple. All that is necessary is a configuration file indicating the initial allocation of capabilities. Keywords can be used, simplifying the creation of an initial mapping. The configuration file is a plist, only because parsing it is easy using existing tools, although a JSON-like syntax could be more user-friendly. The general structure is a control element (named "flags") and a mapping element ("users"). Each user has a list of strings associated with it representing capabilities. Flags: traditional if true, initial "permitted" capabilities are also made "effective." (default: false) Username entry keywords: $unspecified_users any user not specified by an explicit entry Capability entry keywords: $all_caps all capabilities $unprivileged_caps traditional capabilities of an unprivileged user $privileged_caps traditional capabilities of a privileged user For example, here's a configuration file specifying that a user called "ntpd" is allowed to bind(2) to a privileged port and change the system time: users username ntpd capabilities $unprivileged_caps bind_privport change_time Theoretically, the traditional security model can be represented by the following configuration: flags traditional users username root capabilities $all_caps username $unspecified_users capabilities $unprivileged_caps Once a configuration file is present, it should be passed to mount_capfs(8): # mount_capfs -f capfs.conf /cap /cap (5.1) Multiple instances Multiple capfs instances can be present simultaneously, allowing chrooted programs to use capfs, but also, and more importantly, allowing identical users to be presented with different sets of capabilities depending on the root directory. For example, the root user can be limited. When working normally, and / is the root directory, the root user will be unrestricted. However When the root directory is /chroot/ntpd it will be restricted to just the change_time capability, and when it's /chroot/httpd to just bind_privport. Setting up multiple instances is simple and requires only one additional argument to mount_capfs. For example, if we want to mount capfs for a chrooted ntpd user under /chroot/ntpd, and capfs_ntpd.conf is our (minimal) configuration file, we would use: # mount_capfs -f capfs_ntpd.conf -r /chroot/ntpd /cap /chroot/ntpd/cap This ensures that when capabilities are initialized, if the credentials belong to a process whose root directory is /chroot/ntpd, the initial allocation will be according to that specified in capfs_ntpd.conf. An illustration follows. A program that sets its euid to 1000 and tries to open a raw socket is in /chroot/rawsocket. Running it "normally" fails, since the top-level configuration has traditional mode disabled, meaning capabilities must be gained (as shown above) in order to become effective: whatever# /chroot/rawsocket setting euid to 1000 opening raw socket: fail whatever# As can be expected, simply chrooting and running it inside the chroot fails as well: whatever# chroot /chroot /rawsocket setting euid to 1000 opening raw socket: fail whatever# However, once we mount capfs to be available inside the chroot, allowing the user with id 1000 to open a raw socket and enable traditional mode (meaning permitted capabilities are automatically effective), we succeed: whatever# mount_capfs -f capfs_chroot.conf -r /chroot /cap /chroot/cap whatever# chroot /chroot /rawsocket setting euid to 1000 opening raw socket: success whatever# (6) Implementation notes Capfs is implemented as a pseudo file-system and a set of kauth(9) listeners. No userland modifications or recompilations are required, except of course for the addition of mount_capfs(8). (a) Changes to kauth(9) Capabilities are attached to credentials as secmodel private data. The semantics of the KAUTH_CRED_INIT notification were modified and it's now called when credentials are really initialized (e.g. from a set-id context) for a certain user. A new KAUTH_CRED_ALLOC notification was added to reflect initial allocation of credentials that are not yet initialized. Another notification, KAUTH_CRED_CHROOT, was added to indicate that a process is being chrooted. Code similar to that when changing id was added to guarantee a new set of credentials. Similar changes are part of Aleksey Cheusov's securechroot. (b) Forking When a process forks, the child gets a reference to the parent's credentials. This worked fine in traditional Unix settings since credentials held only user/group ids and whenever those were changed in e.g. do_setresuid() the credentials were created anew, leaving each process with its own set. In a modern, extensible environment, where a secmodel can add its own credential data -- like capfs does with capabilities -- this means that every location that changes secmodel-specific credentials should also deal with preventing improper propagation. There are four ways this can be addressed: (A) Code similar to that in do_setresuid() should be written and called by every secmodel when and if it changes its own private credential data, or (B) Forking should always create a copy of the credentials, guaranteeing no two processes share the same instance, (C) The secmodel should copy the credentials itself during a fork, and properly handle the copy notification as well, or (D) We extend secmodel_register() to take e.g. a secmodel_id_t that describes the secmodel (name, description, etc.) and also contains a boolean indicating whether the secmodel needs credentials to be copied on fork. Every time a secmodel registers with this variable set to 'true' a reference count in kauth(9) is raised; when the secmodel deregisters, it's lowered. In kauth_proc_fork(), if this counter is zero we do the traditional kauth_cred_hold(), but when it's greater than zero we do a copy. Option (C) is clearly superior to options (A) and (B) since not all secmodels have their own private credential data and the overhead can be avoided, and it requires much less code to implement. That said, in the case more than one secmodel has private credential data the number of copies will grow exponentially, making it very unscalable. Ideally option (D) should be implemented, but since this is beyond the scope of capfs, it remains to be done. (c) Mounting The user that mounts capfs is allowed to specify any capabilities in the configuration and they are not checked against its own, as ideally would have happened. (d) Supported capabilities So far only the only capabilities supported are: bind_privport bind to a privileged port change_time change the system time raw_socket open a raw socket The $all_caps and $privileged_caps sets are identical and contain all three. The $unprivileged_caps set is empty. New capabilities can be added by adding a definition for the capability to sys/secmodel/capfs/capfs_common.h, e.g.: #define CAP_REBOOT CAP_BIT(3UL) and updating the relevant sets (further down in the same file) if needed. The CAP_FIRST/CAP_LAST definitions should also be kept in sync as indicated. Then the capability name should be added to the cap_descriptions array in sys/secmodel/capfs/capfs_common.c, e.g.: { CAP_REBOOT, "reboot" }, Finally, handling for the capability should be added to the secmodel code itself in sys/secmodel/capfs/secmodel_capfs.c, allowing an operation if it is present, e.g., in the system scope listener, secmodel_capfs_system_cb(): [...] case KAUTH_SYSTEM_REBOOT: if (CAP_PRESENT(capdata->caps_effective, CAP_REBOOT)) result = KAUTH_RESULT_ALLOW; break; [...] (e) Definition of capability The term "capability" is defined as "the ability to perform an operation a security model can hook and make a decision about." In this implementation, capfs implements a security model using kauth(9), and can only provide capabilities that correspond to decisions kauth(9) has a say about. If a certain task, e.g. connect(2), does not ask for authorization before proceeding, it falls outside the scope of what a capability is for this experiment because a security model cannot hook into the (non-existent) decision process. (7) Future work In addition to the obvious necessity of adding more capabilities, future work could allow attaching capabilities to programs as well, replacing the set-id bits currently in use. This takes place outside the scope of capfs per se, but can use the same interfaces, so that when a process starts, its entry is looked up and capabilities are loaded and attached to the credentials regardless of the user. Once this is done, a combination of the two can be supported, such that it's possible to specify "user foo has change_time only when executing /bin/date" and further limit capabilities only to approved programs and not have them as a general trait of the user. Both features should be based on fileassoc(9), and will first require finishing up the KPI to support plugging in to the entire file life-cycle. Code that can be used for an initial implementation was published in the following email: http://mail-index.netbsd.org/tech-security/2009/06/27/msg000238.html (8) Status Capfs is new and was not tested thoroughly. It is possible it contains bugs that will have security or stability implications (or both). Due to items (c) and (d) in section 6 above it is not yet entirely flexible. For all of these reasons, capfs should be considered highly experimental and not relied on in any way. (9) Author Elad Efrat