Source-Based Routing with pf(4)

I’ve been playing around for a while now with FreeBSD’s setfib(1) command and multiple routing tables. Having multiple gateways and ensuring that the right traffic is routed to the right gateway is usually a huge PITA. The ‘standard’ way to deal with this problem seems to be using multiple routing tables (or more generally FIBs) and using setfib(1) to route the traffic. This works pretty great – until there’s a case where it doesn’t.

What setfib(1) does work for, really well:

  • egress traffic originating from a socket (or well, process) on the host.
  • non-VNET(9) Jails. Being able to route the complete traffic from a jail through a different gateway (e.g. a VPN) is great. Really, really great.

However, it falls short on a number of cases that aren’t really that simple anymore:

  • routing ingress traffic on a physical interface (or tap(4), for that matter) on an if_bridge(4) interface
  • multiple NATs with different upstream gateways
  • software that binds several sockets with varying privileges (e.g. public service socket, private administration interface on a different socket) on the same interface.

The setup that prompted me to dive into this is the second case, having multiple pf(4) NATs with different upstreams. In particular, I’ve got one NAT on an if_bridge(4) that should be routed exclusively through an openvpn(8) tun(4) device and a second if_bridge(4) with a NAT that should be routed according to the default routing table:

graph TB; subgraph VPN tun0["tun0
192.168.42.23/24
gw 192.168.42.1"] -.-| NAT | bridge0["bridge0
10.13.37.1/24"] bridge0 --- tap0 bridge0 --- tap1 bridge0 --- tap2 bridge0 --- tap3 end subgraph routed normally em0["em0
12.34.56.78/27
gw 12.34.56.65"] -.-| NAT | bridge1["bridge1
192.168.122.1/24"] bridge1 --- tap4 bridge1 --- tap5 bridge1 --- tap6 bridge1 --- tap7 end

By default the routing table has 12.34.56.78/65 as the default route:

# netstat -rn
Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            12.34.56.65        UGS         em0

And the corresponding pf.conf also looks pretty simple:

[...]
# redirection
nat on re0 from bridge0:network -> (re0)
nat on tun0 from bridge1:network -> (tun0)

# filtering
pass out on {re0, tun0} keep state
pass on {bridge0, bridge1}
[...]

OpenVPN Route-Fu

I’m using OpenVPN as a client for the tun(4) device, and don’t really have any control over the OpenVPN server. However, as most OpenVPN providers do, the OpenVPN server I connect to pushes a default route to my client. This might make sense for the average desktop usecase, where you’d want to route all traffic through the VPN, but as soon as there’s a necessity for some SSH connection, having a daemon mess with the default route is a horrible experience.

autossh(1) with remote port-forwarding of the SSH port and a good portion of sleep ${DURATION}; killall openvpn really helps. If the ssh session just happens not to die, you can still kill the sleep(1).

OpenVPN since 2.4 includes a pull-filter option to filter out options you don’t want to have pushed, which is definitely worth checking out, but for a variety of reasons, that isn’t really an option here. I’ll get to those shortly. For now, suffice it to say that you could completely ignore the default gateway being pushed by adding pull-filter ignore "route-gateway" to the OpenVPN configuration.

Pull filters are matched from the beginning of the option, so technically a pull-filter ignore "route" would do it too, and also ignore any other routes being pushed. However, pull-filter ignore "route " (note the whitespace) would only ignore other routes pushed, but not the default route.

Not really trusting the server, I went with a default-deny, selective allow policy:

pull-filter accept "route 192.168.42."
pull-filter accept "route-gateway"
pull-filter ignore "route"

Note that here I’m actively allowing the server to push a default route though.

The reason why ignoring the default gateway is a bad idea is because then we don’t really know where to route our packets that we do want to route via the VPN. The route (no pun intended) to go seems to be the same as before version 2.4, which is setting the route-noexec option, and using a route-up script to set up the routing:

route-noexec       # Do not run /sbin/route add on the routes
script-security 2  # Allow custom scripts to be called
route-up /path/to/route_up.sh

Calling env(1) in that script shows, among others, the environment variables we’re going to need:

  • ${dev} is the name of the tun(4) device.
  • ${route_vpn_gateway} is the gateway pushed with the route-gateway option. This is where we will want to route the traffic from the if_bridge(4) interface.
  • ${route_net_gateway} is the gateway over which the VPN server is reachable.
  • ${untrusted_ip} is the IP of the VPN server. We will have to add a route to this via the default gateway.

other routes might only be available in the up script and not in the route-up script. This configuration works for me, but most likely not in the general case.

Putting this together, the we can replace the default /sbin/route commands run by openvpn(8) with the following route-up.sh script:

#!/bin/sh

# This still changes the default route, which we don't actually want, but we'll
# replace this in the next step.
route add default ${route_vpn_gateway}
route add -net ${untrusted_ip} ${route_net_gateway} 255.255.255.255

pf(4) route-to and friends

This is where the magic happens. route-to is a not-that well-documented feature, and pf.conf(5) states:

route-to
The route-to option routes the packet to the specified interface with an optional address for the next hop. When a route-to rule creates state, only packets that pass in the same direction as the filter rule specifies will be routed in this way. Packets passing in the opposite direction (replies) are not affected and are routed normally.

However, the BNF-Grammar farther down in the manpage is a bit misleading regarding the optional address in routehost, as pointed out in this freebsd-pf mailing list post:

pf-rule        = action [ ( "in" | "out" ) ]
                 [ "log" [ "(" logopts ")"] ] [ "quick" ]
                 [ "on" ifspec ] [ "fastroute" | route ] [ af ] [ protospec ]
                 hosts [ filteropt-list ]
[...]
route          = ( "route-to" | "reply-to" | "dup-to" )
                 ( routehost | "{" routehost-list "}" )
                 [ pooltype ]
[...]
routehost      = "(" interface-name [ address [ "/" mask-bits ] ] ")"

When only the interface-name is given in the routehost specification, the syntax is ambiguous with the parenthesis-wrapped interface specification as described in the manpage:

Surrounding the interface name (and optional modifiers) in parentheses changes this behaviour. When the interface name is surrounded by parentheses, the rule is automatically updated whenever the interface changes its address. The ruleset does not need to be reloaded. This is especially useful with nat.

Daniel Hartmeier’s response is however quite illuminating:

[…] the parentheses are needed when both interface and address are specified. If only the interface is specified, no parentheses are needed or allowed […]

He goes on to explain the effect of not specifying the next-hop address:

If you specify the addresses, this address is used for an arp lookup, and the ethernet frame will have this IP address’ MAC address as destination.

If you don’t specify the address, the destination IP address of the matching packet is used for the arp lookup instead!

If that destination IP address is not local (i.e. must be sent through a next-hop), you MUST specify the next-hop address, or the packet will be dropped, as arp resolution will fail.

So, specifying the next-hop address is not really “optional”. You may even have to split a route-to rule into two separate rules (one with and the other without specifying the next-hop), when some (but not all) possibly matching destinations are local (arp resolvable).

This means we will have to add a pass rule for traffic arriving on the if_bridge device to route it to the ${route_vpn_gateway}. To prevent the complete pf ruleset to be reloaded (with possible state flushes) on every VPN connection, we move the NAT logic to an anchor and load that from our route-up.sh script:

#!/bin/sh
pfctl -a openvpn -f /path/to/pf.openvpn.conf -D tun_dev=${dev} -D tun_gw=${route_vpn_gateway}
route add -net ${untrusted_ip} ${route_net_gateway} 255.255.255.255

Here, pfctl(8)’s -D switch is extremely useful to pass the environment variables into the anchor as macros. The pf.openvpn.conf anchor then becomes:

vpn_bridge = "bridge0"

nat on $tun_dev from $vpn_bridge:network -> ($tun_dev)

pass out quick on $tun_dev keep state
pass out quick on $vpn_bridge keep state

pass in quick on $vpn_bridge route-to $tun_dev keep state
pass in quick on $vpn_bridge route-to ($tun_dev $tun_gw) keep state

This anchor is attached to the main pf(4) ruleset with the appropriate nat-anchor openvpn and anchor openvpn attachment points in the translation and filtering segments, respectively. I use pass quick so that the filter anchor does not need to be the last rule in the main ruleset due to to pf(4)’s last-match-wins policy.

Putting it all together

What we’re still missing:

  • a route-down script that will flush the anchor and delete the route to the untrusted_ip.
  • redirects have been neglected. These will most likely need a reply-to pass rules.
Use whatever you may find here at your own risk. This is my personal collection of things I feel may be useful to others but there are no guarantees, no warranties nor any form of support. See the FAQ for details.