1. Introduction

NPF is a layer 3 packet filter, supporting IPv4 and IPv6, as well as layer 4 protocols such as TCP, UDP and ICMP. NPF offers a traditional set of features provided by most packet filters. This includes stateful packet filtering, network address translation (NAT), tables (using a hash or tree as a container), rule procedures for easy development of NPF extensions, packet normalisation and logging, connection saving and restoring and more. NPF focuses on high performance design, ability to handle large volume of clients and using the speed of multi-core systems. It was written from scratch in 2009 and is released under permissive 2-clause BSD license.

1.1. Brief notes on design

NPF uses Berkeley Packet Filter (BPF) byte-code, which is just-in-time (JIT) compiled into the machine code. Each rule is described by a sequence of low level operations to perform for a packet. This design has the advantage of protocol independence, therefore support for new protocols (for example, layer 7) or custom filtering patterns can be easily added at userspace level without any modifications to the kernel itself.

NPF provides rule procedures as the main interface to implement custom extensions. Syntax of the configuration file supports arbitrary procedures with their parameters, as supplied by the extensions. An extensions consists of two parts: a dynamic module (.so file) supplementing the npfctl(8) utility and a kernel module (.kmod file). Kernel interfaces are available for use and avoid modifications to the NPF core code.

The internals of NPF are abstracted into well defined modules and follow strict interfacing principles to ease the extensibility. Communication between userspace and the kernel is provided through the library — libnpf, described in the npf(8) manual page. It can be conveniently used by the developers who create their own extensions or third party products based on NPF. Application level gateways (ALGs), such as support for traceroute(8), are also abstracted in separate modules.

1.2. Processing

NPF intercepts the packets at layer 3 on entry to the IP stack. The packet may be rejected before the NPF inspection if it is malformed and has invalid IPv4 or IPv6 header or some fields. Incomming IP packets are passed to NPF before the IP reassembly. Unless disabled, reassembly is performed by NPF.

Processing is performed on each interface a packet is traversing, either as incomming or outgoing. Support for processing on forwarding path and fast-forward optimisations are planned for the future release.

Processing order within NPF is as follows:

state inspectionrule inspection (if no state)address translationrule procedure

2. Configuration

The first step is configuring general networking settings in the system, for example assigning the addresses and bringing up the interfaces. NetBSD beginners can consult rc(8), rc.conf(5), ifconfig.if(5) and other manual pages. The second step is to create NPF’s configuration file (by default, /etc/npf.conf). We will give an overview with some simple and practical examples. A detailed description of the syntax and options is provided in the npf.conf(5) manual page. The following is a simplistic configuration, which contains two groups for two network interfaces and a default group:

srv# cat /etc/npf.conf
alg "icmp"
$ext_if = { inet4(wm0), inet6(wm0) }
$int_if = { inet4(wm1), inet6(wm0) }
group "external" on $ext_if {
        pass all
}
group "internal" on $int_if {
        pass all
}
group default {
        pass final on lo0 all
        block all
}

It will be explained in detail below.

2.1. Control

NPF can be controlled through the npfctl(8) utility or NetBSD’s rc.d system. The latter is used during system startup, but essentially it is a convenience wrapper. The following is an example for starting NPF and loading the configuration through the rc.d script:

srv# echo 'npf=YES' >> /etc/rc.conf
srv# /etc/rc.d/npf reload
Reloading NPF ruleset.
srv# /etc/rc.d/npf start
Enabling NPF.
srv# npfctl
Usage:  npfctl start | stop | flush | show | stats
        npfctl validate | reload [<rule-file>]
        npfctl rule "rule-name" { add | rem } <rule-syntax>
        npfctl rule "rule-name" rem-id <rule-id>
        npfctl rule "rule-name" { list | flush }
        npfctl table <tid> { add | rem | test } <address/mask>
        npfctl table <tid> { list | flush }
        npfctl sess-load | sess-save

Any modifications of npf.conf require reloading of the ruleset by performing a reload command in order to make the changes active. One difference from other packet filters is the behaviour of the start and stop commands. These commands do not actually change (i.e. load or unload) the active configuration. Running start will only enable the passing of packets through NPF, while stop will disable such passing. Therefore, configuration should first be activated using the reload command and then filtering enabled with start. Similarly, clearing of the active configuration is done by performing the stop and flush commands. Such behaviour allows users to efficiently disable and enable filtering without actually changing the active configuration, which might be unnecessary and expensive.

2.2. Variables

Variables are general purpose keywords which can be used in the ruleset to make it more flexible and easier to manage. Most commonly, variables are used to define one of the following: IP addresses, networks, ports or interfaces. A variable can contain multiple elements.

In the example above, network interfaces are defined using the $ext_if and $int_if variables (note that the dollar sign ($) indicates a variable), which can be used further in the configuration file.

Certain functions can be applied to the interfaces: inet4() and inet6(). The functions extract IPv4 or IPv6 addresses respectively.

2.3. Groups

Having one huge ruleset for all interfaces or directions might be inefficient; therefore, NPF requires that all rules be defined within groups. Groups can be thought of as higher level rules which can contain subrules. The main properties of a group are its interface and traffic direction. Packets matching group criteria are passed to the ruleset of that group. If a packet does not match any group, it is passed to the default group. The default group must always be defined.

In the given example, packets passing the wm0 network interface will first be inspected by the rules in the group named "external" and if none matches, the default group will be inspected. Accordingly, if the packet is passing wm1, group "internal" will be inspected first, etc. If the packet is neither on wm0 nor wm1, then the default group would be inspected first.

2.4. Rules

Rules, which are the main part of the NPF configuration, describe the criteria used to inspect and make decisions about packets. Currently, NPF supports filtering on the following criteria: interface, traffic direction, protocol, IP address or network, TCP/UDP port or range, TCP flags and ICMP type/code. Supported actions are blocking or passing the packet.

Each rule has a priority, which is set according to its order in the ruleset. Rules defined first are accordingly inspected first. All rules in the group are inspected sequentially and the last matching one dictates the action to be taken. Rules, however, may be explicitly marked as final. In such cases, processing stops after encountering the first matching rule marked as final. If there is no matching rule in the custom group, then as described previously, rules in the default group will be inspected.

In the example, both interfaces have a "pass all" rule, which permits any incoming and outgoing packets on these interfaces.

2.5. Tables

A common problem is the addition or removal of many IP addresses for a particular rule or rules. Reloading the entire configuration is a relatively expensive operation and is not suitable for a stream of constant changes. It is also inefficient to have many different rules with the same logic just for different IP addresses. Therefore, NPF tables are provided as a high performance container to solve this problem.

NPF tables are containers designed for large IP sets and frequent updates without reloading the entire ruleset. They are managed separately, without reloading of the active configuration. It can either be done dynamically or table contents can be loaded from a separate file, which is useful for large static tables.

There are two supported NPF table types: "tree" and "hash". The underlying data structure, accordingly, is either a PATRICIA radix tree or a hash table. These data structures allow NPF to perform efficient lookups. Tree tables perform IP prefix matching, therefore both single addresses and address ranges may be added into the table. Hash tables, in contrast, can only store single IP addresses.

The following fragment is an example using two tables:

table <blacklist> type hash file "/etc/npf_blacklist"
table <permitted> type tree dynamic
group "external" on $ext_if {
        block in final from <blacklist>
        pass stateful out final from <permitted>
}

The static table identified as "blacklist" is loaded from a file (in this case, located at /etc/npf_blacklist). The dynamic table is initially empty and has to be filled once the configuration is loaded. Tables can be filled and controlled using the npfctl(8) utility. Examples to flush a table, add an entry and remove an entry from the table:

srv# npfctl table "blacklist" flush
srv# npfctl table "permitted" add 10.0.1.0/24
srv# npfctl table "permitted" rem 10.0.2.1

A public ioctl(2) interface for applications to manage the NPF tables is also provided.

2.6. Rule procedures

Rule procedures are a key interface in NPF, which is designed to perform custom actions on packets. Users can implement their own specific functionality as a kernel module extending NPF. The NPF extensions will be discussed thoroughly in the further chapter on Extensions API.

The configuration file is flexible to accept calls to such procedures with variable arguments. Apart from syntax validation, the npfctl(8) utility has to perform extra checks while loading the configuration. It checks whether the custom procedure is registered in the kernel and whether the arguments of the procedure are valid (e.g. that the passed values are permitted). There are built-in rule procedures provided by NPF, e.g. packet logging and traffic normalisation.

The following is an example of two rule procedure definitions — one for logging and another one for normalisation:

procedure "log" {
        log: npflog0
}
procedure "norm" {
        normalize: "random-id", "min-ttl" 512, "max-mss" 1432
}

Traffic normalisation has a set of different mechanisms. In the example above, the normalisation procedure has arguments which apply the following mechanisms: IPv4 ID randomisation, Don’t Fragment (DF) flag cleansing, minimum TTL enforcement and TCP MSS "clamping".

To execute the procedure for a certain rule, use the apply keyword:

group "external" on $ext_if {
        block in final from <blacklist> apply "log"
}

In the case of stateful inspection (when a rule contains the stateful keyword), the rule procedure will be associated with the state i.e. the connection. Therefore, a rule procedure would be applied not only for the first packets which match the rule, but also for all subsequent packets belonging to the connection. It should be noted that a rule procedure is associated with the connections for their entire life cycle (until all associated connections close) i.e. a rule procedure may stay active even if it was removed from the configuration.

2.7. Application Level Gateways

Certain application layer protocols are not compatible with NAT and require translation outside layer 3 and 4. Such translation is performed by the packet filter extensions called application level gateways (ALGs). Some common cases are: traceroute and FTP applications.

Support for traceroute (both ICMP and UDP cases) is built-in, unless NPF is used from kernel modules. In that case, kernel module can be autoloaded though the configuration, e.g. by adding the following line in npf.conf:

alg "icmp"

Alternatively, ALG kernel module can be loaded manually:

modload npf_alg_icmp

3. Dynamic rules

NPF has support for dynamic rules which can be added or removed to a given ruleset without reloading the entire configuration. Consider the following fragment:

group default {
        ruleset "test-set"
}

Dynamic rules can be managed using npfctl(8):

$ npfctl rule "test-set" add block proto icmp from 192.168.0.6
OK 1
$ npfctl rule "test-set" list
block proto icmp from 192.168.0.6
$ npfctl rule "test-set" add block from 192.168.0.7
OK 2
$ npfctl rule "test-set" list
block proto icmp from 192.168.0.6
block from 192.168.0.7
$ npfctl rule "test-set" rem block from 192.168.0.7
$ npfctl rule "test-set" rem-id 1
$ npfctl rule "test-set" list

Each rule gets a unique identifier which is returned on addition. The identifier should be considered as alphanumeric string. As shown in the example, there are two methods to remove a rule:

  • Using a unique identifier (rem-id command).

  • Passing the exact rule and using a hash computed on a rule (rem command).

In the second case, SHA1 hash is computed on a rule to identify it. Although very unlikely, it is subject to hash collisions. For a fully reliable and more efficient way, it is recommended to use the first method.

4. Stateful filtering

TCP is a connection-oriented protocol, which means that network stacks have a state structure for each connection. The state is updated during the session. A specific session is determined by the source and destination IP addresses, port numbers and the direction of the initial packet. Additionally, TCP is responsible for reliable transmission, which is achieved using TCP sequence and window numbers. Validating the data of each packet according to the data in the state structure, as well as updating the state structure, is called TCP state tracking. Since packet filters are the middle points between the hosts (i.e. senders and receivers) they have to perform their own TCP state tracking for each session in order to reliably distinguish different TCP connections and perform connection-based filtering.

Heuristic algorithms are used to handle out-of-order packets, packet losses and prevent connections from malicious packet injections. Using the conceptually same technique, limited tracking of message-based protocols, mainly UDP and ICMP, can also be done. Packet filters which have the described functionality are called stateful packet filters. For a more detailed description of the mechanism, one can refer to Rooij G., "Real stateful TCP packet filtering in IP Filter", 10th USENIX Security Symposium invited talk, Aug. 2001.

NPF is a stateful packet filter capable of tracking TCP connections, as well as performing limited UDP and ICMP tracking. Stateful filtering is enabled using the "stateful" keyword. In such cases, as described in the previous paragraph, a state (a session) is created and any further packets of the connection are tracked. Packets in the backwards stream, after having been confirmed to belong to the same connection, are passed without ruleset inspection. Example configuration fragment with stateful rules:

group "external" on $ext_if {
        block all
        pass stateful in final proto tcp flags S/SA to $ext_if port ssh
}

In this example, all incoming and outgoing traffic on the $ext_if interface will be blocked, with the exception of incoming SSH traffic (with the destination being an IP address of this interface) and the implicitly passed backwards stream (outgoing reply packets) of these SSH connections. Since initial TCP packets opening a connection are SYN packets, such rules often have additional TCP filter criterion. The expression flags S/SA extracts SYN and ACK flags and checks that SYN is set and ACK is not.

5. Network Address Translation

NPF supports various forms of network address translation (NAT). Dynamic (stateful) NAT is supported, which includes traditional NAT (known as NAPT or masquerading), bi-directional NAT and port forwarding. Additionally, there is support for static (stateless) NAT: simple 1:1 mapping of IPv4 addresses, as well as IPv6-to-IPv6 network prefix translation (NPTv6). NAT64 (the protocol translation) is planned for a future release of NPF.

It should be remembered that dynamic NAT, as a concept, relies on stateful filtering, therefore it is performing it implicitly. The following is an example configuration fragment of a traditional NAPT setup:

map $ext_if dynamic $localnet -> $ext_if
group "external" on $ext_if {
        pass stateful out final proto tcp flags S/SA from $localnet
}

The first line enables traditional NAPT (keyword map) on the interface specified by $ext_if for all packets from the network defined in $localnet to any other network (0.0.0.0/0), where the address to translate to is the (only) one on the interface $ext_if (it may be specified directly as well, and has to be specified directly if the interface has more than one IPv4 address).

The arrow indicates the translation type, which can be one of the following:

  • -> for outbound NAT (also known as source NAT).

  • <- for inbound NAT (destination NAT).

  • <-> for bi-directional NAT.

The rule pass … permits all outgoing packets from the specified network. It additionally has stateful tracking enabled with the keyword stateful. Therefore, any incoming packets belonging to the connections which were created by initial outgoing packets will be implicitly passed.

The following two lines are example fragments of bi-directional NAT and port 8080 forwarding to a local IP address, port 80:

map $ext_alt_if dynamic $local_host_1 <-> $ext_alt_if
map $ext_if dynamic $local_host_2 port 80 <- $ext_if port 8080

In the examples above, NPF determines the filter criteria from the segments on the left and right hand side implicitly. Filter criteria can be specified explicitly using pass … syntax in conjunction with map. In such case the criteria has to be full, i.e. for both the source and the destination. For example:

map $ext_if dynamic 127.0.0.1 port 8080 <- 0.0.0.0 \
    pass from 10.0.0.1 to $rdr_ip port 80

This rule would redirect traffic only from 10.0.0.1 host with destination port 80 and according destination address. The left hand side (as it is inbound NAT), according to the arrow, is used as a translation address. It should be noted that the right hand side is ignored (and thus can be 0.0.0.0) as the filter criteria is specified explicitly.

6. Extensions API

NPF provides extensions framework for easy addition of custom functionality. An extension implements the mechanism which can be applied to the packets or the connections using rule procedures described in a previous chapter.

An extension consists of two parts: a parser module which is a dynamic library (.so file) supplementing the npfctl(8) utility and a kernel module. The syntax of npf.conf supports arbitrary procedures with their parameters, as supplied by the modules.

As an example to illustrate the interface, source code of the random-block extension will be used. Reference:

The reader is assumed to have basic familiarity with the kernel interfaces.

6.1. Parser module

The parser module is responsible for indicating which functions are provided by the extension, parsing their parameters and constructing a structure (object) to pass for the kernel module.

The dynamic module should have the following routines, where <extname> represents the name of the extension:

  • Initialisation routine: int npfext_<extname>_init(void);

  • Constructor: nl_ext_t *npfext_<extname>_construct(const char *name);

  • Parameter processor: int npfext_<extname>_param(nl_ext_t *ext, const char *param, const char *val);

Initialisation routine is called once DSO is loaded. Any state initialisation can be performed here. The constructor routine is called for every rule procedure which has an invoking call to an extension. Consider the following rule procedures in npf.conf:

procedure "test1" {
        rndblock: percentage 30.0;
}
procedure "test2" {
        rndblock: percentage 20.0;
        log: npflog0;
}

There will be three calls to npfext_extname_construct(): two with the name "rndblock" and one with the name "log". The routine should match the name against "rndblock", ignoring the "log" case. Note that the first two ought to construct two separate objects having different properties. Therefore:

nl_ext_t *
npfext_rndblock_construct(const char *name)
{
        if (strcmp(name, "rndblock") != 0) {
                return NULL;
        }
        return npf_ext_construct(name);
}

Multiple functions can be supported by a single extension, e.g. it may match "rndblock", "rnd-block" or another function implementing some different functionality.

Upon object creation, parameter processing routine is invoked for every specified function argument which is a key-value pair. Therefore, for the first case, npfext_rndblock_param() would be called with param value "percentage" and val being "30.0". This routine is responsible for parsing the values, validating them and setting the extension object accordingly. For an example, see npfext_rndblock_param(). Note that a single parameter may be passed and "val" can be NULL. The routine should return zero on success and error number on failure, in which case npfctl will issue an error. The npf(3) library provides an interface to set attributes of various types, e.g. npf_ext_param_u32.

The extension object will be passed to the kernel during the configuration load. The kernel module will be the consumer.

6.2. Kernel module

The kernel module of the NPF extensions is the component which implements the actual functionality. It consumes the data provided by the parser module i.e. configuration provided from the userspace. As there can be multiple rule procedures, there can be multiple configurations (extension objects) passed.

The kernel module should have the following:

  • Module definition: NPF_EXT_MODULE(npf_ext_<extname>, "");

  • Module control routine: static int npf_ext_<extname>_modcmd(modcmd_t cmd, void *arg);

  • Register itself on module load: void *npf_ext_register(const char *name, const npf_ext_ops_t *ops);

  • Unregister itself on module unload with: int npf_ext_unregister(void *extid);

See npf_ext_rndblock_modcmd() for an example of the control routine. A set of operations to register:

static const npf_ext_ops_t npf_rndblock_ops = {
        .version        = NPFEXT_RNDBLOCK_VER,
        .ctx            = NULL,
        .ctor           = npf_ext_rndblock_ctor,
        .dtor           = npf_ext_rndblock_dtor,
        .proc           = npf_ext_rndblock
};

The structure has the following members:

  • .version  — is used as a guard for the interface versioning

  • .ctx  — an optional "context" argument passed to each routine

  • .ctor  — constructor for each extension object received from the npfctl dynamic module

  • .dtor  — destructor for in-kernel extension objects

  • .proc  — the processing routine executed for each packet

The constructor is called on configuration load. This routine should retrieve the data (extension object) passed by the parser module and create in-kernel object associated with a rule procedure. The construction shall have these arguments:

static int npf_ext_rndblock_ctor(npf_rproc_t *rp, prop_dictionary_t params);
  • rp — rule procedure to associate with

  • params — data from the parser module (as a property list dictionary, see proplib(3)).

A new object (metadata) shall be associated with the rule procedure using npf_rproc_assign routine.

The destructor is called when the rule procedure is destroyed (due to the flush of configuration or reload with procedure removed). It shall have these arguments:

static void npf_ext_rndblock_dtor(npf_rproc_t *rp, void *meta);
  • rp — associated rule procedure

  • meta — metadata object

It is the responsibility of this routine to destroy meta object and any other resources created in the constructor.

The processing routine is a key routine, which inspects the packet or the connection and can perform an arbitrary action (including the modification of the packet) or decide its destiny (pass or block). This routine shall have the following arguments:

static void npf_ext_rndblock(npf_cache_t *npc, nbuf_t *nbuf, void *meta, int *decision);
  • npc — structure containing information about L3/L4 headers

  • nbuf — network buffer which can be inspected using NPF’s nbuf interface

  • meta — metadata object

  • decision — the current decision made by upper layer, which may be NPF_DECISION_BLOCK or NPF_DECISION_PASS.

The extension may set decision accordingly. Normally, an extension should not override NPF_DECISION_BLOCK.

7. Troubleshooting

NPF provides information about the current active configuration using the npfctl(8) command "show". It should be noted that the output produced by this command may not conform to the syntax of the npf.conf file. It is also not recommended to use this command for high frequency requests, as it is not designed to scale.

NPF also provides general statistics via the command "stats". These and the packet logging interface can be used for troubleshooting. Additionally, there are debugging facilities, such as the "debug" command and the npftest utility, which are mainly targeted at developers.

8. Appendixes

8.1. Report a bug

Before reporting a bug, please carefully check the documentation and verify that it is a problem in the software rather than configuration mistake or misinterpretation of the documentation. Additionally, lookup the existing GNATS database in case the problem is already reported.

When submitting a new report — describe the problem clearly, be specific and gather extra information (npf.conf, network configuration, environment, etc). If the system is crashing, please:

  • Gather stack traces. Use command bt in the DDB debugger prompt. May need to set DDB-on-panic on system boot, using sysctl -w ddb.onpanic = 1.

  • Attempt to reproduce the crash with the kernel compiled using DEBUG, DIAGNOSTIC and LOCKDEBUG options. Note: LOCKDEBUG option is very expensive.

The problem report (PR) can be submitted using THIS FORM.

8.2. Source code

The latest source code of NPF is located in the main tree of NetBSD.

8.3. Documentation

This documentation is generated using AsciiDoc. For read-only GIT access, run

git clone http://www.netbsd.org/~rmind/npf/.git

Note: this location may change in the future.

Patches can be submitted to the GNATS. Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.

Copyright (c) 2009-2014 The NetBSD Foundation, Inc.
All rights reserved.