pfSense is one of very few open source solutions offering enterprise-class high availability capabilities with stateful failover, allowing the elimination of the firewall as a single point of failure. High Availability is achieved through a combination of features:
CARP for IP address redundancy
XMLRPC for configuration synchronization
pfsync for state table synchronization
With this configuration, units act as an “active/passive” cluster with the primary node working as the master unit and the secondary node in a backup role, taking over as needed if the primary node fails.
Though often erroneously called a “CARP Cluster”, two or more redundant pfSense firewalls are more aptly titled a “High Availability Cluster” or “HA Cluster”, since CARP is only one of several technologies used to achieve High Availability with pfSense, and in the future CARP could be swapped for a different redundancy protocol.
One interface on each cluster node will be dedicated for synchronization tasks. This is typically referred to as the “Sync” interface, and it is used for configuration synchronization and pfsync state synchronization. Any available interface may be used.
Some call this the “CARP” interface but that is incorrect and very misleading. CARP heartbeats happen on each interface with a CARP VIP; CARP traffic and failover actions do not utilize the Sync interface.
The most common High Availability cluster configuration includes only two nodes. It is possible to have more nodes in a cluster, but they do not provide a significant advantage.
It is important to distinguish between the three functions (IP address redundancy, configuration synchronization, and state table synchronization), because they happen in different places. Configuration synchronization and state synchronization happen on the sync interface, directly communicating between firewall units. CARP heartbeats are sent on each interface with a CARP VIP. Failover signaling does not happen on the sync interface, but rather it happens on every CARP-enabled interface.
Common Address Redundancy Protocol (CARP) was created by OpenBSD developers as a free, open redundancy solution for sharing IP addresses among a group of network devices. Similar solutions already existed, primarily the IETF standard for Virtual Router Redundancy Protocol (VRRP). However Cisco claims VRRP is covered by its patent on their Hot Standby Router Protocol (HSRP), and told the OpenBSD developers that it would enforce its patent. Hence, the OpenBSD developers created a new free, open protocol to accomplish essentially the same result without infringing on Cisco’s patent. CARP became available in October 2003 in OpenBSD, and was later added to FreeBSD as well.
A CARP type Virtual IP address (VIP) is shared between nodes of a cluster. One node is master and receives traffic for the IP address, and the other nodes maintain backup status and monitor for heartbeats to see if they need to assume the master role if the previous master fails. Since only one member of the cluster at a time is using the IP address, there is no IP address conflict for CARP VIPs.
In order for failover to work properly it is important that inbound traffic coming to the cluster, such as routed upstream traffic, VPNs, NAT, local client gateway, DNS requests, etc., be sent to a CARP VIP and for outgoing traffic such as Outbound NAT to be sent from a CARP VIP. If traffic is addressed to a node directly and not a CARP VIP, then that traffic will not be picked up by other nodes.
CARP works similar to VRRP and HSRP, and may even conflict in some cases. Heartbeats are sent out on each interface containing a CARP VIP, one heartbeat per VIP per interface. At the default values for skew and base, a VIP sends out heartbeats about once per second. The skew determines which node is master at a given point in time. Whichever node transmits heartbeats the fastest assumes the master role. A higher skew value causes heartbeats to be transmitted with more delay, so a node with a lower skew will be the master unless a network or other issue causes the heartbeats to be delayed or lost.
Never access the firewall GUI, SSH, or other management mechanism using a CARP VIP. For management purposes, only use the actual IP address on the interface of each separate node and not the VIP. Otherwise it cannot be determined beforehand which unit is being accessed.
A High Availability cluster using CARP needs three IP addresses in each subnet along with a separate unused subnet for the Sync interface. For WANs, this means that a /29 subnet or larger is required for an optimal configuration. One IP address is used by each node, plus a shared CARP VIP address for failover. The synchronization interface only requires one IP address per node.
It is technically possible to configure an interface with a CARP VIP as the only IP address in a given subnet, but it is not generally recommended. When used on a WAN, this type of configuration will only allow communication from the primary node to the WAN, which greatly complicates tasks such as updates, package installations, gateway monitoring, or anything that requires external connectivity from the secondary node. It can be a better fit for an internal interface, however internal interfaces do not typically suffer from the same IP address limitations as a WAN, so it is still preferable to configure IP addresses on all nodes.
CARP heartbeats utilize multicast and may require special handling on the switches involved with the cluster. Some switches filter, rate limit, or otherwise interfere with multicast in ways that can cause CARP to fail. Also, some switches employ port security methods which may not work properly with CARP.
At a minimum, the switch must:
Allow Multicast traffic to be sent and received without interference on ports using CARP VIPs.
Allow traffic to be sent and received using multiple MAC addresses.
Allow the CARP VIP MAC address to move between ports.
Nearly all problems with CARP failing to properly reflect the expected status are failures of the switch or other layer 2 issues, so be sure the switches are properly configured before continuing.
pfsync enables the synchronization of the firewall state table between cluster nodes. Changes to the state table on the primary are sent to the secondary firewall(s) over the Sync interface, and vice versa. When pfsync is active and properly configured, all nodes will have knowledge of each connection flowing through the cluster. If the master node fails, the backup node will take over and clients will not notice the transition since both nodes knew about the connection beforehand.
pfsync uses multicast by default, though an IP address can be defined to force unicast updates for environments with only two firewalls where multicast traffic will not function properly. Any active interface can be used for sending pfsync updates, however utilizing a dedicated interface is better for security and performance. pfsync does not support any method of authentication, so if anything other than a dedicated interface is used, it is possible for any user with local network access to insert states into the state table. In low throughput environments that aren’t security paranoid, use of the LAN interface for this purpose is acceptable. Bandwidth required for this state synchronization will vary significantly from one environment to another, but could be as high as 10% of the throughput traversing the firewall depending on the rate of state insertions and deletions in a network.
Failover can still operate without pfsync, but it will not be seamless. Without pfsync if a node fails and another takes over, user connections would be dropped. Users may immediately reconnect through the other node, but they would be disrupted during the transition. Depending on the usage in a particular environment, this may go unnoticed or it could be a significant, but brief, outage.
When pfsync is in use, pfsync settings must be enabled on all nodes participating in state synchronization, including secondary nodes, or it will not function properly.
Traffic for pfsync must be explicitly passed on the Sync interface. The rule must pass the pfsyncprotocol from a source of the Sync network to any destination. A rule passing all traffic of any protocol would also allow the required traffic, but a more specific rule is more secure.
States in pfSense are bound to specific operating system Interfaces. For example, if WAN is em0, then a state on WAN would be tied to em0. If the cluster nodes have identical hardware and interface assignments then this works as expected. In cases when different hardware is used, this can be a problem. If WAN on one node is em0 but on another node it is igb0, the states will not match and they will not be treated the same.
It is always preferable to have identical hardware, but in cases where this is impractical there is a workaround: Adding interfaces to a LAGG will abstract the actual underlying physical interface so in the above example, WAN would be lagg0 on both and states would be bound to lagg0, even though lagg0 on one node contains em0 and it contains igb0 on the other node.
Normally pfSense would allow firewall upgrades without any network disruption. Unfortunately, this isn’t always the case with upgrades as the pfsync protocol can change to accommodate additional functionality. Always check the upgrade guide linked in all release announcements before upgrading to see if there are any special considerations for CARP users.
To make the job of maintaining practically identical firewall nodes easier, configuration synchronization is possible using XML-RPC. When XML-RPC Synchronization is enabled, settings from supported areas are copied to the secondary and activated after each configuration change. XMLRPC Synchronization is optional, but maintaining a cluster is a lot more work without it.
Some areas cannot be synchronized, such as the Interface configuration, but many other areas can: Firewall rules, aliases, users, certificates, VPNs, DHCP, routes, gateways, and more. As a general rule, items specific to hardware or a particular installation, such as Interfaces or values under System > General or System > Advanced do not synchronize. The list of supported areas can vary depending on the version of pfSense in use. For a list of areas that will synchronize, see the checkbox items on System > High Avail Sync in the XMLRPC section. Most packages will not synchronize but some contain their own synchronization settings. Consult package documentation for more details.
Configuration synchronization should use the Sync interface, or if there is no dedicated Sync interface, use the same interface configured for pfsync.
In a two-node cluster the XML-RPC settings must only be enabled on the primary node, the secondary node must have these settings disabled.
For XML-RPC to function, both nodes must have the GUI running on the same port and protocol, for example: HTTPS on port 443, which is the default setting. The admin account cannot be disabled and both nodes must have the same admin account password.
This section describes a simple three interface HA configuration. The three interfaces are LAN, WAN, and Sync. This is functionally equivalent to a two interface LAN and WAN deployment, with the pfsync interface being used solely to synchronize configuration and firewall states between the primary and secondary firewalls.
This example only covers an IPv4 configuration. High Availability is compatible with IPv6, but it requires static addressing on the firewall interfaces. When preparing to configure HA, if static IPv6 assignments are not available, set IPv6 to None on all interfaces.
The first task is to plan IP address assignments. A good strategy is to use the lowest usable IP address in the subnet as the CARP VIP, the next subsequent IP address as the primary firewall interface IP address, and the next IP address as the secondary firewall interface IP address. This design is optional, any scheme may be used, but we strongly recommend a consistent and logical scheme to make design and administration simpler.
The WAN addresses will be selected from those assigned by the ISP. For the example in Table WAN IP Address Assignments, the WAN of the HA pair is 198.51.100.0/24, and the addresses 198.51.100.200 through 198.51.100.202 will be used as the WAN IP addresses.
|198.51.100.200/24||CARP shared IP address|
|198.51.100.201/24||Primary node WAN IP address|
|198.51.100.202/24||Secondary node WAN IP address|
The LAN subnet is 192.168.1.0/24. For this example, the LAN IP addresses will be assigned as shown in Table LAN IP Address Assignments.
|192.168.1.1/24||CARP shared IP address|
|192.168.1.2/24||Primary node LAN IP address|
|192.168.1.3/24||Secondary node LAN IP address|
There is no shared CARP VIP on this interface because there is no need for one. These IP addresses are used only for communication between the firewalls. For this example, 172.16.1.0/24 is used as the Sync subnet. Only two IP addresses will be used, but a /24 is used to be consistent with the other internal interface (LAN). For the last octet of the IP addresses, use the same last octet as that firewall’s LAN IP address for consistency.
|172.16.1.2/24||Primary node Sync IP address|
|172.16.1.3/24||Secondary node Sync IP address|
Figure Example HA Network Diagram shows the layout of this example HA pair. The primary and secondary each have identical connections to the WAN and LAN, and a crossover cable between them to connect the Sync interfaces. In this basic example, the WAN switch and LAN switch are still potential single points of failure. Switching redundancy is covered later in this chapter in Layer 2 Redundancy.
Each node requires some basic configuration outside of the actual HA setup. Do not connect both nodes into the same LAN before both nodes have a non- conflicting LAN setup.
Install the OS on the firewalls as usual and assign the interfaces identically on both nodes. Interfaces must be assigned in the same order on all nodes exactly. If the interfaces are not aligned, configuration synchronization and other tasks will not behave correctly. If any adjustments have been made to the interface assignments, they must be replicated identically on both nodes.
Then, connect to the GUI and use the Setup Wizard to configure each firewall with a unique hostname and non-conflicting static IP addresses. Refer back to Setup Wizard if needed.
For example, one node could be “firewall-a.example.com” and the other “firewall- b.example.com”, or a more personalized pair of names.
Avoid naming the nodes “master” or “backup” since those are states and not roles, instead they could be named “primary” and “secondary”.
The default LAN IP address is 192.168.1.1. Each node must be moved to its own address, such as 192.168.1.2 for the primary and 192.168.1.3 for the secondary. This layout is shown in LAN IP Address Assignments. Once each node has a unique LAN IP address, then both nodes may be plugged into the same LAN switch.
Before proceeding, the Sync interfaces on the cluster nodes must be configured. Sync IP Address Assignments lists the addresses to use for the Sync interfaces on each node. Once that has been completed on the primary node, perform it again on the secondary node with the appropriate IPv4 address value.
To complete the Sync interface configuration, firewall rules must be added to both nodes to allow synchronization.
At a minimum, the firewall rules must pass the configuration synchronization traffic (by default, HTTPS on port 443) and pfsync traffic. In most cases, a simple “allow all” style rule is used.
When complete, the rules will look like the example in figure Example Sync Interface Firewall Rules, which also includes a rule to allow ICMP echo (ping) for diagnostic purposes.
The secondary does not need those rules initially, only a rule to allow traffic to the GUI for XML-RPC to function. The full set of rules will synchronize once XML-RPC has been configured.
State synchronization using pfsync must be configured on both the primary and secondary nodes to function.
First on the primary node and then on the secondary, perform the following:
Navigate to System > High Avail Sync
Check Synchronize States
Set Synchronize Interface to SYNC
Set pfsync Synchronize Peer IP to the other node. Set this to
172.16.1.3 when configuring the primary node, or
172.16.1.2 when configuring the secondary node
Configuration synchronization must only be configured on the primary node. Never activate options in this section on the secondary node of a two-member cluster.
On the primary node only, perform the following:
Navigate to System > High Avail Sync
Set Synchronize Config to IP to the Sync interface IP address on the secondary node,
Set Remote System Username to
This must always be
admin, no other user will work!
Set Remote System Password to the admin user account password, and repeat the value in the confirmation box.
Check the boxes for each area to synchronize to the secondary node. For this guide, as with most configurations, all boxes are checked. The Toggle All button may be used to select all of the options at once, rather than selecting them individually.
As a quick confirmation that the synchronization worked, on the secondary node navigate to Firewall > Rules on the SYNC tab. The rules entered on the primary are now there, and the temporary rule is gone.
The two nodes are now linked for configuration synchronization! Changes made to the primary node in supported areas will be synchronized to the secondary whenever a change is made.
Do not make changes to the secondary in areas set to be synchronized! These changes will be overwritten the next time the primary node performs a synchronization.
With configuration synchronization in place, the CARP Virtual IP addresses need only be added to the primary node and they will be automatically copied to the secondary.
Navigate to Firewall > Virtual IPs on the primary node to manage CARP VIPs
Click Add at the top of the list to create a new VIP.
A VIP must be added for each interface handling user traffic, in this case WAN and LAN.
Defines the type of VIP, in this case CARP.
Defines the interface upon which the VIP will reside, such as WAN
The Address box is where the IP address values are entered for the VIP. A subnet mask must also be selected and it must match the subnet mask on the interface IP address. For this example, enter
|Virtual IP Password:|
Sets the password for the CARP VIP. This need only match between the two nodes, which will be handled by synchronization. The password and confirm password box must both be filled in and they must match.
Defines the ID for the CARP VIP A common tactic is to make the VHID match the last octet of the IP address, so in this case choose
determines how often CARP heartbeats are sent.
Some text to identify the VIP, such as
If CARP appears to be too sensitive to latency on a given network, adjusting the Base by adding one second at a time is recommended until stability is achieved.
The above description used the WAN VIP as an example. The LAN VIP would be configured similarly except it will be on the LAN interface and the address will be
192.168.1.1 (See LAN IP Address Assignments).
If there are any additional IP addresses in the WAN subnet that will be used for purposes such as 1:1 NAT, port forwards, VPNs, etc, they may be added now as well.
Click Apply Changes after making any edits to the VIPs.
After adding VIPs, check Firewall > Virtual IPs on the secondary node to ensure that the VIPs synchronized as expected.
The Virtual IP addresses on both nodes will look like CARP Virtual IP Address List if the process was successful.
The next step will be to configure NAT so that clients on the LAN will use the shared WAN IP as the address.
Navigate to Firewall > NAT, Outbound tab
Click to select Manual Outbound NAT rule generation
A set of rules will appear that are the equivalent rules to those in place for Automatic Outbound NAT. Adjust the rules for internal subnet sources to work with the CARP IP address instead.
Click to the right of the rule to edit
Locate the Translation section of the page
Select the WAN CARP VIP address from the Address drop-down
Change the Description to mention that this rule will NAT LAN to the WAN CARP VIP address
If additional local interfaces are added later, such as a second LAN, DMZ, etc, and that interface uses private IP addresses, then additional manual outbound NAT rules must be added at that time.
When complete, the rule changes will look like those found in Outbound NAT Rules for LAN with CARP VIP
The DHCP server daemons on the cluster nodes need adjustments so that they can work together. The changes will synchronize from the primary to the secondary, so as with the VIPs and Outbound NAT, these changes need only be made on the primary node.
Navigate to Services > DHCP Server, LAN* tab.
Set the DNS Server to the LAN CARP VIP, here
Set the Gateway to the LAN CARP VIP, here
Set the Failover Peer IP to the actual LAN IP address of the secondary node, here
Setting the DNS Server and Gateway to a CARP VIP ensures that the local clients are talking to the failover address and not directly to either node. This way if the primary fails, the local clients will continue talking to the secondary node.
The Failover Peer IP allows the daemon to communicate with the peer directly in this subnet to exchange data such as lease information. When the settings synchronize to the secondary, this value is adjusted automatically so the secondary points back to the primary.
HA can also be deployed for firewall redundancy in a multi-WAN configuration. This section details the VIP and NAT configuration needed for a dual WAN HA deployment. This section only covers topics specific to HA and multi-WAN.
For this example, four IP addresses will be used on each WAN. Each firewall needs an IP address, plus one CARP VIP for Outbound NAT, plus an additional CARP VIP for a 1:1 NAT entry that will be used for an internal mail server in the DMZ segment.
Table WAN IP Addressing show the IP addressing for both WANs. In most environments these will be public IP addresses.
|198.51.100.200||Shared CARP VIP for Outbound NAT|
|198.51.100.201||Primary firewall WAN|
|198.51.100.202||Secondary firewall WAN|
|198.51.100.203||Shared CARP VIP for 1:1 NAT|
|203.0.113.10||Shared CARP VIP for Outbound NAT|
|203.0.113.11||Primary firewall WAN2|
|203.0.113.12||Secondary firewall WAN2|
|203.0.113.13||Shared CARP VIP for 1:1 NAT|
The LAN subnet is 192.168.1.0/24. For this example, the LAN IP addresses will be assigned as follows.
|192.168.1.1||CARP shared LAN VIP|
|192.168.1.2||Primary firewall LAN|
|192.168.1.3||Secondary firewall LAN|
The DMZ subnet is 192.168.2.0/24. For this example, the DMZ IP addresses will be assigned as follows in Table DMZ IP Address Assignments.
|192.168.2.1||CARP shared DMZ VIP|
|192.168.2.2||Primary firewall DMZ|
|192.168.2.3||Secondary firewall DMZ|
There will be no shared CARP VIP on this interface because there is no need for one. These IP addresses are used only for communication between the firewalls. For this example, 172.16.1.0/24 will be used as the Sync subnet. Only two IP addresses will be used, but a /24 is used to be consistent with the other internal interfaces. For the last octet of the IP addresses, the same last octet as that firewall’s LAN IP is chosen for consistency.
|172.16.1.2||Primary firewall Sync|
|172.16.1.3||Secondary firewall Sync|
The NAT configuration when using HA with Multi-WAN is the same as HA with a single WAN. Ensure that only CARP VIPs are used for inbound traffic or routing. See Network Address Translation for more information on NAT configuration.
With Multi-WAN a firewall rule must be in place to pass traffic to local networks using the default gateway. Otherwise, when traffic attempts to reach the CARP address or from LAN to DMZ it will instead go out a WAN connection.
A rule must be added at the top of the firewall rules for all internal interfaces which will direct traffic for all local networks to the default gateway. The important part is the gateway needs to be defaultfor this rule and not one of the failover or load balance gateway groups. The destination for this rule would be the local LAN network, or an alias containing any locally reachable networks.
Due to the additional WAN and DMZ elements, a diagram of this layout is much more complex as can be seen in Figure Diagram of Multi-WAN HA with DMZ.
Since using HA is about high availability, thorough testing before placing a cluster into production is a must. The most important part of that testing is making sure that the HA peers will failover gracefully during system outages.
If any actions in this section do not work as expected, see High Availability Troubleshooting.
On both systems, navigate to Status > CARP (failover). If everything is working correctly, the primary will show MASTER for the status of all CARP VIPs and the secondary will show BACKUP.
If either instead shows DISABLED, click the Enable CARP button and then refresh the page.
If an interface shows INIT, it means the interface containing the CARP VIP does not have a link. Connect the interface to a switch, or at least to the other node. If the interface will not be used for some time, remove the CARP VIP from the interface as this will interfere with normal CARP operation.
Navigate to key locations on the secondary node, such as Firewall > Rules and Firewall > NAT and ensure that rules created only on the primary node are being replicated to the secondary node.
If the example earlier in this chapter was followed, the “temp” firewall rule on the pfsync interface would be replaced by the rule from the primary.
If DHCP failover was configured, its status can be checked at Status > DHCP Leases. A new section will appear at the top of the page containing the status of the DHCP Failover pool, as in Figure DHCP Failover Pool Status.
Now for the real failover test. Before starting, make sure that a local client behind the CARP pair on LAN can connect to the Internet with both pfSense firewalls online and running. Once that is confirmed to work, it is an excellent time to make a backup.
For the actual test, unplug the primary node from the network or shut it down temporarily. The client will be able to keep loading content from the Internet through the secondary node. Check Status > CARP (failover) again on the backup and it will now report that it is MASTER for the LAN and WAN CARP VIPs.
Now bring the primary node back online and it will regain its role as MASTER, and the backup system will demote itself to BACKUP once again. At any point during this process, Internet connectivity will still work properly.
Test the HA pair in as many failure scenarios as possible. Additional tests include:
Unplug the WAN or LAN cable
Pull the power plug of the primary
Disable CARP on the primary using both the temporary disable feature and maintenance mode
Test with each system individually (power off secondary, then power back on and shut down the primary)
Download a file or try streaming audio/video during the failover
Run a continuous ICMP echo request (ping) to an Internet host during the failover
As mentioned earlier, only CARP VIPs provide redundancy for addresses directly handled by the firewall, and they can only be used in conjunction with NAT or services on the firewall itself. Redundancy can also be provided for routed public IP subnets with HA. This section describes this type of configuration, which is common in large networks, ISP and wireless ISP networks, and data center environments.
At least a /29 public IP block for the WAN side of pfSense is necessary, which provides six usable IP addresses. Only three are required for a two firewall deployment, but this is the smallest IP subnet that will accommodate three IP addresses. Each firewall requires one IP, and at least one CARP VIP is needed on the WAN side.
The second public IP subnet will be routed to one of the CARP VIPs by the ISP, data center, or upstream router. Because this subnet is being routed to a CARP VIP, the routing will not be dependent upon a single firewall. For the depicted example configuration in this chapter, a /24 public IP subnet will be used and it will be split into two /25 subnets.
The example network depicted here is a data center environment consisting of two pfSense firewalls with four interfaces each: WAN, LAN, DBDMZ, and pfsync. This network contains a number of web and database servers. It is not based on any real network, but there are countless production deployments similar to this.
The WAN side connects to the upstream network, either the ISP, data center, or upstream router.
The WEB segment in this network uses the “LAN” interface but renamed. It contains web servers, so it has been named WEB but it could be called DMZ, SERVERS, or anything desired.
This segment is an OPT interface and contains the database servers. It is common to segregate the web and database servers into two networks in hosting environments. The database servers typically do not require direct access from the Internet, and hence are less subject to compromise than web servers.
The Sync network in this diagram is used to replicate pfSense configuration changes via XML-RPC and for pfsync to replicate state table changes between the two firewalls. As described earlier in this chapter, a dedicated interface for this purpose is recommended.
Figure Diagram of HA with Routed IPs illustrates this network layout, including all routable IP addresses, the WEB network, and the Database DMZ.
Segments containing database servers typically do not need to be publicly accessible, and hence would more commonly use private IP subnets, but the example illustrated here can be used regardless of the function of the two internal subnets.
The diagrams earlier in this chapter did not describe layer 2 (switch) redundancy, to avoid throwing too many concepts at readers simultaneously. This section covers the layer 2 design elements to be considered when planning a redundant network. This chapter assumes a two system deployment, though this scales to as many installations as required.
If both redundant pfSense firewalls are plugged into the same switch on any interface, that switch becomes a single point of failure. To avoid this single point of failure, the best choice is to deploy two switches for each interface (other than the dedicated pfsync interface).
Example HA Network Diagram is network-centric, not showing the switch infrastructure. The Figure Diagram of HA with Redundant Switches illustrates how that environment looks with a redundant switch infrastructure.
When using multiple switches, the switches should be interconnected. As long as there is a single connection between the two switches, and no bridge on either of the firewalls, this is safe with any type of switch. Where using bridging, or where multiple interconnections exist between the switches, care must be taken to avoid layer 2 loops. A managed switch would be required which is capable of using Spanning Tree Protocol (STP) to detect and block ports that would otherwise create switch loops. When using STP, if an active link dies, e.g. switch failure, then a backup link can automatically be brought up in its place.
pfSense also has support for lagg(4) link aggregation and link failover interfaces which allows multiple network interfaces to be plugged into one or more switches for increased fault tolerance. See LAGG (Link Aggregation) for more information on configuring link aggregation.
It is more difficult to obtain host redundancy for critical systems inside the firewall. Each system could have two network cards and a connection to each group of switches using Link Aggregation Control Protocol (LACP) or similar vendor-specific functionality. Servers could also have multiple network connections, and depending on the OS it may be possible to run CARP or a similar protocol on a set of servers so that they would be redundant as well. Providing host redundancy is more specific to the capabilities of the switches and server operating systems, which is outside the scope of this book.
When trying to design a fully redundant network, there are many single points of failure that sometimes get missed. Depending on the level of uptime to achieve, there are more and more things to consider than a simple switch failure. Here are a few more examples for redundancy on a wider scale:
Supply isolated power for each redundant segment.
Use separate breakers for redundant systems.
Use multiple UPS banks/generators.
Use multiple power providers, entering opposite sides of the building where possible.
Even a Multi-WAN configuration is no guarantee of Internet uptime.
Use multiple Internet connection technologies (DSL, Cable, Fiber, Wireless).
If any two carriers use the same pole/tunnel/path, they could both be knocked out at the same time.
Have backup cooling, redundant chillers or a portable/emergency air conditioner.
Consider placing the second set of redundant equipment in another room, another floor, or another building.
Have a duplicate setup in another part of town or another city.
I hear hosting is cheap on Mars, but the latency is killer.
High availability is not currently compatible with bridging in a native capacity that is considered reliable or worthy of production use. It requires significant manual intervention. The details of the process can be found in High Availability.
If there are a large number of CARP VIPs on a segment, this can lead to a lot of multicast traffic. One heartbeat per second is sent per CARP VIP. To reduce this traffic, additional VIPs may be “stacked” on top of one CARP VIP on an interface. First, pick one CARP VIP to be the “main” VIP for the interface. Then, change the other CARP VIPs in that same subnet to be an IP Alias type VIP, with the “main” CARP VIP interface selected to be their Interface on the VIP configuration.
This not only reduces the heartbeats that will be seen on a given segment, but it also causes all of the IP alias VIPs to change status along with the “main” CARP VIP, reducing the likelihood that a layer 2 issue will cause individual CARP VIPs to not fail over as expected.
IP Alias VIPs do not normally synchronize via XML-RPC configuration synchronization, however, IP alias VIPs set to use CARP interfaces in this manner will synchronize.
If multiple subnets are required on a single interface with HA, this may be accomplished using IP Aliases. As with the main interface IP addresses, we recommend each firewall have an IP address inside the additional subnet, for a total of at least three IPs per subnet. Separate IP alias entries must be added to each node inside the new subnet, ensuring that their subnet masks match the actual subnet mask for the new subnet. IP alias VIPs that are directly on an interface do not sync, so this is safe.
Once the IP Alias VIP has been added to both nodes to gain a foothold in the new subnet, CARP VIPs may then be added using IP addresses from the new subnet.
It is possible to omit the IP Aliases and use a CARP VIP directly in the other subnet so long as communication between the additional subnet and both individual HA nodes is not required.
High availability configurations can be complex, and with so many different ways to configure a failover cluster, it can be tricky to get things working properly. In this section, some common (and not so common) problems will be discussed and hopefully solved for the majority of cases. If issues are still present after consulting this section, there is a dedicated CARP/VIPs board on the pfSense Forum.
Before proceeding, take the time to check all members of the HA cluster to ensure that they have consistent configurations. Often, it helps to walk through the example setup, double checking all of the proper settings. Repeat the process on the secondary node, and watch for any places where the configuration must be different on the secondary. Be sure to check the CARP status (Check CARP status) and ensure CARP is enabled on all cluster members.
Errors relating to HA will be logged in Status > System Logs, on the System tab. Check those logs on each system involved to see if there are any messages relating to XMLRPC sync, CARP state transitions, or other related errors.
There are three common misconfigurations that happen which prevent HA from working properly.
A different VHID must be used on each CARP VIP created on a given interface or broadcast domain. With a single HA pair, input validation will prevent duplicate VHIDs. Unfortunately it isn’t always that simple. CARP is a multicast technology, and as such anything using CARP on the same network segment must use a unique VHID. VRRP also uses a similar protocol as CARP, so ensure there are no conflicts with VRRP VHIDs, such as if the ISP or another router on the local network is using VRRP.
The best way around this is to use a unique set of VHIDs. If a known-safe private network is in use, start numbering at 1. On a network where VRRP or CARP are conflicting, consult with the administrator of that network to find a free block of VHIDs.
Check that all systems involved are properly synchronizing their clocks and have valid time zones, especially if running in a Virtual Machine. If the clocks are too far apart, some synchronization tasks like DHCP failover will not work properly.
The real subnet mask must be used for a CARP VIP, not /32. This must match the subnet mask for the IP address on the interface to which the CARP IP is assigned.
The interface upon which the CARP VIP resides must already have another IP defined directly on the interface (VLAN, LAN, WAN, OPT) before it can be utilized.
There are a few reasons why this error turns up in the system logs, some more worrisome than others.
If CARP is not working properly when this error is present, it could be due to a configuration mismatch. Ensure that for a given VIP, that the VHID, password, and IP address/subnet mask all match.
If the settings appear to be proper and CARP still does not work while generating this error message, then there may be multiple CARP instances on the same broadcast domain. Disable CARP and monitor the network with tcpdump (Packet Capturing) to check for other CARP or CARP-like traffic, and adjust VHIDs appropriately.
If CARP is working properly, and this message is in the logs when the system boots up, it may be disregarded. It is normal for this message to be seen when booting, as long as CARP continues to function properly (primary shows MASTER, secondary shows BACKUP for status).
This will happen if the secondary cannot see the CARP advertisements from the primary. Check for firewall rules, connectivity trouble, switch configurations. Also check the system logs for any relevant errors that may lead to a solution. If this is encountered in a Virtual Machine (VM) Product such as ESX, see Issues inside of Virtual Machines (ESX).
In some cases, this is may happen normally for a short period after a system comes back online. However, certain hardware failures or other error conditions can cause a server to silently take on a high advskew of 240 in order to signal that it still has a problem and should not become master. This can check be checked from the GUI, or via the shell or Diagnostics > Command.
In the GUI, this condition is printed in an error message on Status > CARP.
From the shell or Diagnostics > Command, run the following command to check for a demotion:
# sysctl net.inet.carp.demotionnet.inet.carp.demotion: 240
If the value is greater than
0, the node has demoted itself.
In that case, isolate the firewall, check its network connections, and perform further hardware testing.
If the demotion value is 0 and the primary node still appears to be demoting itself to BACKUP or is flapping, check the network to ensure there are no layer 2 loops. If the firewall receives back its own heartbeats from the switch, it can also trigger a change to BACKUP status.
When using HA inside of a Virtual Machine, especially VMware ESX, some special configurations are needed:
Enable promiscuous mode on the vSwitch.
Enable “MAC Address changes”.
Enable “Forged transmits”.
If a Virtual Distributed Switch is in use, a port group can be made for the firewall interfaces with promiscuous mode enabled, and a separate non- promiscuous port group for other hosts. This has been reported to work by users on the forum as a way to strike a balance between the requirements for letting CARP function and for securing client ports.
If a VDS (Virtual Distributed Switches) was used in 4.0 or 4.1 and upgrade from 4.0 to 4.1 or 5.0, the VDS will not properly pass CARP traffic. If a new VDS was created on 4.1 or 5.0, it will work, but the upgraded VDS will not.
It is reported that disabling promiscuous mode on the VDS and then re- enabling it will resolve the issue.
If port mirroring is enabled on a VDS it will break promiscuous mode. To fix it, disable and then re-enable promiscuous mode.
If a physical HA cluster is connected to a switch with an ESX host using multiple ports on the ESX host (lagg group or similar), and only certain devices/IPs are reachable by the target VM, then the port group settings may need adjusting in ESX to set the load balancing for the group to hash based on IP, not the originating interface.
Side effects of having that setting incorrectly include:
Traffic only reaching the target VM in promiscuous mode on its NIC.
Inability to reach the CARP VIP from the target VM when the “real” IP address of the primary firewall can be reached.
Port forwards or other inbound connections to the target VM work from some IP addresses and not others.
Self-demotion in CARP relies on the loss of link on a switch port. As such, if a primary and secondary firewall instance are on separate ESX units and the primary unit loses a switch port link and does not expose that to the VM, CARP will stay MASTER on all of its VIPs there and the secondary will also believe it should be MASTER. One way around this is to script an event in ESX that will take down the switch port on the VM if the physical port loses link. There may be other ways around this in ESX as well.
Use e1000 NICs ( em(4)), not the ed(4) NICs or CARP VIPs will never leave init state.
Setting “Promiscuous mode: Allow All” on the relevant interfaces of the VM allows CARP to function on any interface type (Bridged, Host- Only, Internal)
If the units are plugged into separate switches, ensure that the switches are properly trunking and passing broadcast/multicast traffic.
Some switches have broadcast/multicast filtering, limiting, or “storm control” features that can break CARP.
Some switches have broken firmware that can cause features like IGMP Snooping to interfere with CARP.
If a switch on the back of a modem/CPE is use, try a real switch instead. These built-in switches often do not properly handle CARP traffic. Often plugging the firewalls into a proper switch and then uplinking to the CPE will eliminate problems.
Double check the following items when problems with configuration synchronization are encountered:
The username must be admin on all nodes.
The password in the configuration synchronization settings on the primary must match the password on the backup.
The WebGUI must be on the same port on all nodes.
The WebGUI must be using the same protocol (HTTP or HTTPS) on all nodes.
Traffic must be permitted to the WebGUI port on the interface which handles the synchronization traffic.
The pfsync interface must be enabled and configured on all nodes.
Verify that only the primary sync node has the configuration synchronization options enabled.
Ensure no IP address is specified in the Synchronize Config to IP on the secondary node.
Ensure the clocks on both nodes are current and are reasonably accurate.
If trouble is encountered reaching CARP VIPs from when dealing with Multi-WAN, double check that a rule is present like the one mentioned in Firewall Configuration