Daniel Hartmeier is the original author of pf, the stateful packet filter that has been part of the OpenBSD project since the release of OpenBSD 3.0 in December of 2001. Living in Switzerland, Daniel continues to actively support and improve pf.

In this interview, Daniel discusses the history behind pf and describes the advanced feature set already present in this relatively young code base. The latest version of pf will be present in OpenBSD 3.2, which will be released this Friday, November 1, 2002.


Jeremy Andrews: Please share a little about yourself and your background...

Daniel Hartmeier: I'm 28 years old and live near Basel in Switzerland, in the same town as my mother and my two younger sisters. I dropped out of university after the first year and am working as programmer since. Unmarried, no kids, two cats.

JA: How and when did you get started with OpenBSD?

Daniel Hartmeier: It was around November 1999 when I installed OpenBSD (2.5 by then) for the first time. I was working at a small company and we were looking for an OS to use as an Internet gateway. I didn't have much prior experience with Unix, but after a while, that gateway handled most network services we needed. Since then, I run OpenBSD on all servers and desktops I use.

JA: When you first started working on pf, OpenBSD was using a different packet filter. Why then did you start working on pf?

Daniel Hartmeier: I didn't start working on pf before June 2001. Theo had just removed its predecessor from the source tree after an argument about its license, and I was following discussions on icb about possible alternatives. Kjell Wooding was starting to port ipfw, and Mike Frantzen had a been working on a filter of his own before, which wasn't complete yet. While searching for other packet filters I found Drawbridge, an ethernet layer filter written at tamu.edu. While manually applying the kernel patches I realized that the filter itself was basically a single C module and the interface into the kernel was very simple, just two hooks into the ethernet layer. I had never before done any kernel programming, but I knew C and once I had all network packets flow through a function of my own, I thought it would be interesting to see how far further I could go. If I had known in advance how many nights I would spend, I might have given up. But t! he progress kept me motivated, and by the end of the month, the code was filtering and doing NAT successfully.

JA: Did you intentionally choose the rather generic name 'pf' for your packet filter, or did this become the name by default?

Daniel Hartmeier: Frankly, I didn't spend any time thinking about a name and just used pf (for packet filter) in file and function names. It's short, easily associated with 'packet filter' and didn't collide with existing namespace, and we stuck with it.

JA: Is pf a direct modification to the OpenBSD kernel? Or is it a kernel module?

Daniel Hartmeier: The kernel part of pf consists of three source files now, grouping related functions (filtering, normalization and ioctls). There is no abstract interface, as for instance in FreeBSD which supports multiple packet filters, we just call a single function, pf_test(), from ip_input() and ip_output(), where all packets from network interfaces pass. Additionally, the function is called from the bridge code and after encapsulated packets are unwrapped, so encapsulated packets pass through pf at every layer. The code is linked to the kernel (unless disabled by a #define). While OpenBSD supports loadable kernel modules, none of the kernel code in the base install is loaded dynamically.

JA: How much of the pf implementation is based on other packet filters?

Daniel Hartmeier: The stateful connection tracking is based directly on Guido van Rooij's work (which is also the basis for IPFilter). We check each sequence number in each TCP packet against narrow windows of legal values. Mike Frantzen wrote this implementation, and he also fine-tuned all parameters to minimize the number of mismatches in real traffic. I don't know about commercial firewalls, but I believe this is the best implementation of stateful filtering around. Linux' netfilter is heading in the same direction, I think.

Fragment reassembly and normalization (eliminating ambiguities in packets that a receiver might interpret in different ways) was written by Niels Provos, based on Vern Paxson's work. This is something very useful I haven't seen implemented in a packet filter before.

There are several small things, like the state table implementation which includes address translation mapping or the automatic and transparent rule evaluation optimization, that might be novel. But overall, pf is a combination of already well-known data structures and algorithms that implement the also established concept of real stateful packet filtering in the best possible way.

And, of course, many features that were suggested by various people are based on things they know from other packet filters.

JA: Can you describe in more detail how pf's stateful connection tracking works?

Daniel Hartmeier: The general idea in any stateful filter is to keep track of established connections and associate any packet with the connection it is part of. This allows to automatically pass all packets belonging to a connection that was established through the filter. Basically, the rule set has to express only what connections to pass or block, instead of what packets. This makes rule sets simpler and more elegant, as the author doesn't have to manually address both directions packets of the same connection flow in or care about TCP handshakes and flags. And even an expert in stateless filtering can't precisely block all packets that are not part of properly established connections, since the filter must decide based only on the information contained in a single packet, without knowledge of packets seen earlier.

So now the packet filter keeps track of all established connections. Filter rules allow (or disallow) to create certain connections, and the packet filter takes care of associating all packets with those connections, blocking packets that don't match an existing connection or create a new one. When a connection is closed (either through a FIN, RST or timeout), the state entry is removed.

To prevent attackers from tearing down connections, for instance with spoofed RSTs, the packet filter checks the sequence numbers in each TCP packet. Only the two peers involved in the connection (and the hops in between them) know the right sequence numbers, as initial sequence numbers are generated randomly (or should be, rather, but pf can also randomize sequence numbers for hosts that have predictable ISN generators).

The goal in sequence number comparison is to allow only a minimal window of values through. This is not as easy as it may appear from studying perfect examples of TCP connections. In reality, packets can get lost and are retransmitted, packets take different routes and may arrive in different order than they were sent, etc.

Guido's work shows how to keep lower and upper bounds on the sequence numbers given only the (incomplete) information the packet filter has, with a precision and beauty similar to the one you can find in a mathematic proof.

JA: Is Guido's original stateful packet filtering work still available somewhere on the internet?

Daniel Hartmeier: Yes, on http://home.iae.nl/users/guido/papers/tcp_filtering.ps.gz. Anyone who's interested in how sequence number comparison works exactly will find this very interesting, and it's the key to understanding the more cryptic values in verbose pf status output and log files.

JA: What efforts have been done to optimize pf's stateful tracking for performance?

Daniel Hartmeier: To eliminate bottlenecks, the first thing I needed was a setup that would actually push the code to its limits. Old hardware and high rates of small packets can do that. The kernel profiler can then show what functions use up what fractions of cpu time. When filtering statelessly, rule set evaluation happens for every packet (even twice, when it passes two interfaces). Once it was clear how expensive rule set evaluation was, we optimized it by using (more) skip steps.

JA: What limitations are there on the maximum number of active states?

Daniel Hartmeier: The only limit is the amount of memory. A generic kernel on a machine with 64MB of RAM can handle more than 65000 states, and the limit extends linearly when adding more memory. The cost of state lookups is O(log n), which means it scales even better. I don't know of anyone who has actually reached the limit in production. People with several hundred thousands of concurrent connections can usually afford 512MB of RAM. The bottleneck in most real setups is the packet rate, and often it's not the packet filter code that exhausts the cpu resources but interrupt rates of network interface drivers and other factors.

JA: Taking into account the limitations imposed by hardware, what is the maximum packet rate pf can be expected to handle?

Daniel Hartmeier: The smallest legal ethernet frame is 84 bytes, which means the maximal rate is 14880 packets per second for 10 Mbps and 148809 pps for 100 Mbps. If the machine can handle those rates with pf disabled (which depends on hardware and network interface driver), just enabling pf with an empty rule set does not impact performance. With increasing rule set and state table size, the impact increases. A rule set with 20-30 rules, which create around 5000 concurrent states, and have to be evaluated only once per connection (which consist of multiple packets), will typically not affect throughput rate. When the rule set gets larger or has to be evaluated for each packet, or when the state table gets very large, packet loss occurs when the CPU is not fast enough. But real traffic consists of larger packets on average, which means packet rates of 16000 pps are common, and handled without loss by pf.

JA: How much overhead does fragment reassembly and normalization add to packet filtering?

Daniel Hartmeier: There's a choice of different kinds of fragment handling in pf now. You can completely reassemble fragments, at the cost of memory for the fragment cache, or just remove overlapping fragment sections, or leave fragments as they are and deal with them using filter rules.

It's important to to realize that for stateful filtering, the packet filter needs a complete TCP header in each packet, otherwise it can't be associated with a connection (lacking port and sequence numbers). Reassembling fragments allows the filter to deal only with complete packets, reducing the rule set complexity. In my opinion, it's well worth the additional cost. pf allows to specify what packets to normalize in which ways, so you can handle notoriously fragmented but otherwise known-good traffic separately.

JA: Can you describe how combining the state table with the network address translation mapping works, and what benefits this offers?

Daniel Hartmeier: Some packet filters treat state entries and address/port translations independantly and store them in separate data structures. For each packet, they do both a state and a translation lookup. pf implicitly creates state for all translated connections and stores the information needed for translation in the state entry. This simplifies and reduces lookups.

JA: Regarding the automatic and transparent rule evaluation optimization, are you referring to "skip steps"? How do these work?

Daniel Hartmeier: I think this is best explained with a small example. Rule sets in pf consist of a list of rules which are evaluated (for a given packet) from top to bottom. Each rule contains parameters (like addresses or ports) that specify whether the rule applies to a packet, like in this example:

  block in proto tcp from any to any
  pass in proto tcp from any to 1.2.3.4 port ssh keep state
  pass in proto tcp from any to 1.2.3.4 port smtp keep state

In general, rule evaluation means traversing the entire list and finding the last matching rule, which decides whether the packet is passed or blocked. Imagine evaluating this rule set for a UDP packet. The first rule does not match because it specifies protocol TCP, so evaluation would continue with the second rule. But the second (and third) rule also specify the same protocol (TCP, which we already know does not match), so they can't possibly match, either.

Or imagine a TCP packet with destination address 5.6.7.8. The first rule matches, so evaluation continues with the second rule. It does not match, since the specified destination address is different. Because the third rule also specifies the same destination address, it can't possibly match.

And this is what skip steps are. For each parameter in each filter rule, the number of subsequent rules that specify the exact same value are counted. When, during evaluation of a rule, a parameter is found to not match, evaluation is not necessarily continued on the very next rule, but all subsequent rules that can't possibly match are skipped.

This optimization is transparent, it never influences the outcome of the evaluation. The cost of calculating the skip step values occurs only once, when the rule set is loaded. Depending on how many subsequent rules share equal parameters, this speeds up evaluation significantly. In the worst case, nothing is gained but no additional cost occured, either. We saw average rule evaluation cost decrease by nearly 50 percent when we first added skip steps for the parameter specifying the interface, for instance.

JA: What limitations are imposed on the length of rule sets in pf?

Daniel Hartmeier: There's no technical limit, as rule sets need little memory. But the cost of rule set evaluation can increase linearly with the number of rules in the worst case. When filtering statefully, the rule set has to be evaluated only once for each connection, so the number of newly established connections per second becomes the limiting factor. Rule sets with 10 to 1000 rules are common. Extremely large rule sets with more than 100000 rules are obviously generated automatically from databases. In such cases, optimizing rule order manually becomes important, as rule evaluation will likely be the bottleneck of the packet filter. For most people (including me), rule set length is limited by human factors, huge rule sets are hard to understand and maintain. Often, large rule sets indicate lack of design. pf tries to aid in writing small rule sets, often a complete rule set does not exceed 20-30 rules, thanks to macro substitution other! parser features.

JA: How intensive is the creation of skip steps? For example, when working with extremely large rule sets such as what you have described with more than 100,000 rules, how much of an added delay will there be while pf loads the ruleset?

Daniel Hartmeier: On the machine I'm writing this, it takes 0.6s to load 1000 rules, 14s for 5000 and 60s to for 10000 rules. I haven't profiled yet to tell how much of this is caused by skip step calculation, though. For really huge rule sets, the way single rules are passed through ioctl might not be optimal.

JA: Regarding the use of ioctl, there's a comment in the 'pf' man page that I did not fully understand. It reads, "Manipulations like loading a rule set that involve more than a single ioctl call require a so-called ticket, which prevents the occurance of multiple concurrent manipulations." Can you explain what this means and how it works?

Daniel Hartmeier: Updating and querying rule sets from userland is not done atomically in a single ioctl call, as the data passed to and from the kernel can be large. Instead, an ioctl call transfers only a single rule at a time. To prevent multiple concurrent updates or queries, which could result in inconsistent rule sets, a simple locking scheme is used. Before the ioctl that reads or writes a single rule can be called, the caller has to obtain a lock (in form of a number, or ticket) through a separate ioctl call, which is an atomic operation. The ticket has to be passed in subsequent ioctls. When two processes try to perform concurrent manipulations, the second ticket issued invalidates the first, and the first process will get an error from further ioctl calls, because its ticket has become invalid. This rarely occurs when manipulating rule sets manually with pfctl, but daemons like authpf (which is used to insert and remove packet filter! rules after users authenticate themselves using ssh) can potentially try to access the rule sets concurrently.

JA: How many people are actively working on pf?

Daniel Hartmeier: There are about five people that primarily work on pf, but almost everyone of the OpenBSD developers has done some part of pf at one point in time. If you look at the commit logs, you'll see many names, and often there are more people involved in the work that leads to a commit than is visible.

JA: How active are you in pf development these days?

Daniel Hartmeier: I try to spend most of my free time on OpenBSD, and while I enjoy exploring new areas of code, a large part of my work is still focused on pf. I can spend more time on testing and debugging now that the rate at which new code is added has decreased, which I enjoy a lot.

JA: How has pf changed between its original implementation in OpenBSD 3.0, and its current implementation in OpenBSD 3.2?

Daniel Hartmeier: Mainly in stability, performance and features. After 3.0 was released, the user base grew significantly, and the feedback helped track down many smaller and a few larger problems. Performance was already surprisingly good in 3.0, but some bottlenecks were found after benchmarking, and these have been removed. And of course many new, some unique, features were added. I think the most important change isn't found in the code at all, it's the fact that we now know for sure that pf works efficiently and securely in large, productive systems. I'm proud we convinced most of the people that were (understandably) cautious and reluctant to trust their networks to a merely five months old packet filter when pf was first shipped with 3.0.

JA: What are some of the bottlenecks that were present in 3.0 that have since been fixed?

Daniel Hartmeier: The most prominent one was the fact that pf evaluated the rule set for each packet on each interface it passed through without skipping over certain rules (scrub) and using possible skip steps.

JA: How does pf performance compare to other stateful packet filters?

Daniel Hartmeier: In the benchmarks I did and based on the feedback from people who compared pf with other filters on production machines, very well, often significantly better. In particular, we found that keeping state on all connections scales well and is faster than stateless rule evaluation.

JA: Are any of these benchmarks available online?

Daniel Hartmeier: Yes, they were done as part of a Usenix paper which you can find at http://www.benzedrine.cx/pf-paper.html

The most important, and possibly surprising conclusion was that state lookups are far more efficient than rule set evaluations, and keeping state, apart from simplifying rule sets and improving filtering decisions, does improve performance. Anyone who is filtering statelessly based on the assumption that keeping state is too expensive might find this very interesting.

JA: How effective is pf with IPv6?

Daniel Hartmeier: IPv6 support was a strong objective from the beginning, and already 3.0 was able to filter IPv6, thanks to the work done by Jun-ichiro itojun Hagino and Ryan McBride. And support will further improve as IPv6 deployment increases. I'm filtering my IPv6 tunnel at home, and it works very well.

JA: What unique challenges does filtering IPv6 packets present to a packet filter?

Daniel Hartmeier: There's no fundamental difference, but there are a lot of small details to be taken care of (like header options). I wasn't familiar with IPv6 before pf, not to the same degree as with IPv4. I think people will also use different approaches filtering IPv6, there's no need for NAT anymore, instead the same host will end up having numerous aliases. You can assign an address for each service instead of filtering on port numbers. And since most people are still tunneling IPv6 over IPv4 right now, pf is used to filter both tunneled and decapsulated IPv6.

JA: What protection does pf provide against IP spoofing?

Daniel Hartmeier: The sequence number comparison done in stateful filtering prevents anyone not knowing the appropriate sequence numbers from injecting TCP packets into a foreign connection. When initial sequence numbers are chosen randomly (or pf's sequence number modulation is used), predicting sequence numbers becomes unfeasible.

As for spoofed SYNs which create state entries, pf allows to specify state timeouts for all phases of a TCP connection, on a per-rule level. For instance, you can use low timeout values for states of not fully established TCP connections. The spoofer, unable to complete the TCP handshake without having seen the peer's ISN in the SYN+ACK (which was sent back to the spoofed source, and dropped there), can't create long-lifed states and exhaust the state table. Also, each rule creating state can specify a maximum number of states it is allowed to create, preventing one class of connections from exhausting resources for another class.

I'm aware of other approaches to these problems, like SYN cookies, but there are currently no plans for implementing them in pf, since the existing methods work very well.

JA: What future plans do you have for pf?

Daniel Hartmeier: There are plans for redundancy and fail-over, which are unfortunately still obstructed by patent issues. Load-balancing of various kinds is also an often requested feature now being worked on. Right now, altq is being merged with pf, so a unified rule set can enqueue packets with single evaluation.

JA: What patent issues are obstructing the work towards redundancy and fail-over?

Daniel Hartmeier: Cisco holds a patent on VRRP, which describes how to set up multiple redundant firewalls, detect failure and automatically switch to a backup (sharing the same MAC address so the change doesn't require massive ARP lookups). I haven't seen a better approach, but unfortunately the licensing conditions make it unusable for OpenBSD.

JA: Are you currently looking into alternative methods for providing redundancy and automatic failover?

Daniel Hartmeier: Yes. If someone knows of alternatives to VRRP that are free 'for anyone to use for any purpose, without restrictions', I'm very interested.

JA: What will altq add to pf when it is fully merged?

Daniel Hartmeier: It's the other way around, altq will continue to do what it already does (manipulating output queues on network interfaces, limiting bandwidth of specified categories of traffic), but pf will assign the packets to the queues. Since rule evaluation and connection tracking is already done in pf, doing queue assignment there will reduce per-packet cost, and combining configuration parsing will simplify syntax.

JA: When can we expect to see the merged altq and pf as part of the OpenBSD distribution?

Daniel Hartmeier: Almost certainly in 3.3, which will ship in around six months.

JA: Are you aware of any efforts to create a graphical user interface for pf, to simplify configuration and maintenance?

Daniel Hartmeier: There are several projects (you can find all the links at the bottom of http://www.benzedrine.cx/pf.html). I do understand that many users would like to have an interface that makes configuration simpler. But the complexity doesn't consist of editing a text file or getting the rule syntax right, in my opinion. It's about understanding TCP/IP and protocols, as well as how rule evaluation and stateful filtering works. I guess a valuable user interface would allow to formulate filtering policies more abstractly but still support the full spectrum of common policies. It might benefit from being graphical, if the visualization makes policies easier to formulate. I haven't seen a firewall (G)UI that achieves this, though, many focus primarily on graphical editing, while not providing the more important abstraction model. Honestly, I don't know how I would do it better. I t! hink it's more efficient, in the long term, to climb the learning curve and write the rules manually.

JA: Have you worked with other open source kernels besides OpenBSD?

Daniel Hartmeier: I follow kernel source changes of other BSD and Linux trees in some areas, but I haven't worked actively with them, no.

JA: Are you aware of any efforts to port pf to other kernels, such as FreeBSD, NetBSD, or Linux?

Daniel Hartmeier: I've exchanged mails with several people who were porting pf to other BSDs, yes. I'm not very familiar with other kernels, but I'll gladly answer questions from anyone interested, even though I don't have enough time to write and maintain such patches myself.

JA: How do you enjoy spending your time when you're not working on OpenBSD?

Daniel Hartmeier: Apart from OpenBSD and my daytime job, I don't have much time left. I like to play with my cats, ride motorcycle, and forget work watching a movie or playing RPGs, as everyone else, I suppose.

JA: I sometimes get the impression that OpenBSD developers are perceived as being less tolerant than other open source development teams. Additionally, the rift between OpenBSD creator Theo de Raadt and the NetBSD team is will known. From your own perspective, how do you find interaction with other members of the OpenBSD team?

Daniel Hartmeier: I've never experienced intolerance among OpenBSD developers. There are sometimes strong opinions, but they are always based on technical subjects. Yes, Theo is well known for defending standpoints uncompromisingly, and that's very valuable when the point in question is right but unpopular. Cooperation with OpenBSD developers has been one of the best experiences in team work in my life, and I'm glad I had the chance to learn in this environment. This is the first time I've actively participated in an open source project, maybe this is the reason why it's so different, but it's definitely better than any team work motivated by financial interests I know.

JA: Thank you very much for taking the time to answer all my questions, and for all the hard work you've put into pf! I've been confidently using pf to secure my home networks since shortly after OpenBSD 3.0 was released without experiencing any problems.

Daniel Hartmeier: Thank you for the opportunity to talk about pf, the pleasure was all mine.


Related links: