Есть два сервера с операционной системой Ubuntu 20.04.4 LTS. На обоих серверах развёрнут Postfix сервер. Почтовики работают только на отправку писем, на них всё уже настроено. Пришла задача объединить их в один отказоустойчивый кластер Master — Slave с плавающим виртуальным IP-адресом с помощью keepalived. Главная задача: принимать почту на один IP адрес и обеспечить резервирование postfix.
keepalived — это open source программное обеспечение (написано на C), предназначенное для обеспечения функций высокой надежности (high availabilitty) и балансировки нагрузки (load-balancing) в кластерах Linux. За первую функцию отвечает реализация протокола VRRP, а вторая основывается на модуле ядра IPVS.
VRRP (Virtual Router Redundancy Protocol) — сетевой протокол, предназначенный для увеличения доступности маршрутизаторов, выполняющих роль шлюза по умолчанию. Это достигается путём объединения группы маршрутизаторов в один виртуальный маршрутизатор и назначения им общего IP-адреса, который и будет использоваться как шлюз по умолчанию для компьютеров в сети.
IPVS (IP Virtual Server) — модуль ядра Linux, реализующий балансировку нагрузки транспортного уровня, обычно называемую коммутацией уровня 4. L4-маршрутизатор.
Установка keepalived
В популярных ОС keepalived имеется в официальных репозиториях. Deb:
apt update
apt install keepalived
Rpm:
yum install keepalived
Настройка кластера
- На первом сервере настроен IP адрес: 10.12.49.35
- На втором сервере настроен IP адрес: 10.12.49.36
Выделяем новый IP адрес, который будет использоваться в качестве виртуального: 10.12.49.37.
Настраиваем конфигурационный файл /etc/keepalived/keepalived.conf на первом сервере:
cd /etc/keepalived
vim keepalived.conf
Содержимое:
vrrp_script chk_service {
script "/opt/postfix.check.sh"
interval 3
weight 100
user postfix
}
vrrp_instance VI_postfix {
interface ens160
state MASTER
priority 100
virtual_router_id 173
advert_int 1
unicast_src_ip 10.12.49.35
unicast_peer {
10.12.49.36
}
virtual_ipaddress {
10.12.49.37
}
track_script {
chk_service
}
}
Настраиваем конфигурационный файл /etc/keepalived/keepalived.conf на втором сервере:
cd /etc/keepalived
vim keepalived.conf
Содержимое:
vrrp_script chk_service {
script "/opt/postfix.check.sh"
interval 3
weight 100
user postfix
}
vrrp_instance VI_postfix {
interface ens160
state BACKUP
priority 100
virtual_router_id 173
advert_int 1
unicast_src_ip 10.12.49.36
unicast_peer {
10.12.49.35
}
virtual_ipaddress {
10.12.49.37
}
track_script {
chk_service
}
}
Давайте посмотрим, что у нас тут есть.
- vrrp_script — скрипт проверки сервиса, chk_service — имя функции
- script — скрипт может быть любым, он должен вернуть код ответа 0 если postfix работает нормально, или любой другой ответ если не работает
- interval — периодичность запуска скрипта проверки в секундах
- weight — значение, на которое будет уменьшен приоритет сервера, в случае перехода в состояние FAULT
- user — пользователь, под которым запускается скрипт, в данном случае postfix — пользователь postfix
- vrrp_instance — экземпляр сервиса, назовём VI_postfix
- interface — имя сетевого интерфейса
- state — начальное состояние при запуске. На главном MASTER, на втором BACKUP
- priority — приоритет выбора главного сервера
- virtual_router_id — уникальный идентификатор VRRP экземпляра, должен быть одинаковым на всех серверах экземпляра VRRP
- advert_int — периодичность оповещения мастером других нод время в секундах, если за данное время мастер не успеет отправить широковещательный сигнал, начнутся выборы другого мастера на основе priority
- unicast_src_ip — адрес источник VRRP пакетов
- unicast_peer — адреса получатели VRRP пакетов
- virtual_ipaddress — блок виртуальных IP адресов, которые будут активны на мастере, должны совпадать на всех серверах экземпляра VRRP
- track_script — мониторинг с использованием скрипта, который должен возвращать 0 если проверка завершилась успешно
Есть примеры с использованием опции authentication. Она устарела и была удалена из VRRPv2 в спецификации RFC3768 в 2004 году. Рекомендуется избегать использования этой опции.
-
Keepalived Configuration Manual Page
keepalived.conf(5) Keepalived Configuration's Manual keepalived.conf(5) NAME keepalived.conf - configuration file for Keepalived Note: This documentation MUST be considered as THE exhaustive source of in- formation in order to configure Keepalived. This documenation is sup- ported and maintained by Keepalived Core-Team. DESCRIPTION keepalived.conf is the configuration file which de- scribes all the Keepalived keywords. Keywords are placed in hierarchies of blocks and subblocks, each layer being delimited by '{' and '}' pairs. Comments start with '#' or '!' to the end of the line and can start anywhere in a line. The keyword 'include' and variants allow inclusion of other configura- tion files from within the main configuration file, or from subse- quently included files. The format of the include directive is: include FILENAME FILENAME can be a fully qualified or relative pathname, and can include wildcards, including csh style brace expressions such as "{foo/{,cat,dog},bar}" if glob() supports them. After opening an included file, the current directory is set to the di- rectory of the file itself, so any relative paths included from a file are relative to the directory of the including file itself. The include variants add additional include checks to the current in- clude_check level (see below) The variants are: includer FILENAME - same as include_check readable includem FILENAME - same as include_check match includew FILENAME - same as include_check wild- card_match includeb FILENAME - same as include_check brace_match includea FILENAME - all include_check checks NOTE: If the libc glob() function does not support GLOB_ALTDIRFUNC (e.g. Musl libc as on Alpine Linux etc.), then only includea, includer and includew of the above options will work. Why do we want to allow errors? Suppose a configuration has optional files in /etc/keepalived/conf.d, then in- clude_/etc/keepalived/conf.d/* could be specified, but it should not error if there are no files in the directory; in this case includer should be used. Otherwise it is sensible to use in- cludea. include handling will not work with if the include line uses condi- tional configuration or parameter substitution, since the detection of the include keywords is done before the processing on conditional con- figuration and parameter substitution. The basic include keyword is retained for backward compatibil- ity, since it does not produce config errors if files could not be opened etc. PARAMETER SYNTAX <BOOL> is one of on|off|true|false|yes|no <TIMER> is a time value in seconds, including fractional seconds, e.g. 2.71828 or 3; resolution of timer is micro- seconds. SCRIPTS There are three classes of scripts can be configured to be executed. (a) Notify scripts that are run when a vrrp instance or vrrp group changes state, or a virtual server quorum changes between up and down. (b) vrrp tracking scripts that will cause vrrp instances to go down it they exit a non-zero exist status, or if a weight is specified will add or subtract the weight to/from the priority of that vrrp instance. (c) LVS checker misc scripts that will cause a real server to be con- figured down if they exit with a non-zero status. By default the scripts will be executed by user keepalived_script if that user exists, or if not by root, but for each script the user/group under which it is to be executed can be specified. There are significant security implications if scripts are executed with root privileges, especially if the scripts themselves are modifi- able or replaceable by a non root user. Consequently, security checks are made at startup to ensure that if a script is executed by root, then it cannot be modified or replaced by a non root user. All scripts should be written so that they will terminate on receipt of a SIGTERM signal. Scripts will be sent SIGTERM if their parent termi- nates, or it is a script that keepalived is awaiting its exit status and it has run for too long. Quoted strings Quoted strings are specified between " or ' characters and strings are delimited by whitespace. In the examples below the ' characters are not part of the strings and should not be specified: 'abcd" efg h jkl "mnop' will be the single string: 'abcd efg h jkl mnop' whereas: 'abcd "efg h jkl" mnop' will be the three strings: 'abcd', 'efg h jkl' and 'mnop' i.e. the " and ' characters are removed and any intervening whitespace is retained. Quoted strings can also have escaped characters, like the shell. \a, \b, \E, \f, \n, \r, \t, \v, \nnn and \xXX (where nnn is up to 3 octal digits, and XX is any sequence of hex digits) and \cC (which produces the control version of character C) are all supported. \C for any other character C is just treated as an escaped version of character C, so \\ is a \ character and \" will be a " character, but it won't start or terminate a quoted string. For specifying scripts with parameters, unquoted spaces will separate the parameters. If it is required for a parameter to contain a space, it should be enclosed in single quotes ('). CONFIGURATION PARSER Traditionally the configuration file parser has not been one of the strengths of keepalived. Lot of efforts have been put to correct this even if this is not the primal goal of the project. TOP HIERACHY Keepalived configuration file is articulated around a set of configura- tion blocks. Each block is focusing and targetting a specific daemon family feature. These features are: GLOBAL CONFIGURATION BFD CONFIGURATION VRRPD CONFIGURATION LVS CONFIGURATION GLOBAL CONFIGURATION contains subblocks of Global definitions, Linkbeat interfaces, Interface up/down transition delays, Static track groups, Static ad- dresses, Static routes, and Static rules Global definitions # Following are global daemon facilities for running # keepalived in a separate network namespace: # -- # Set the network namespace to run in. # The directory /run/keepalived will be created as an # unshared mount point, for example for pid files. # syslog entries will have _NAME appended to the ident. # Note: the namespace cannot be changed on a configuration reload. net_namespace NAME # Add the IPVS configuration in the specified net namespace. It allows to easily # split the VIP traffic on a given namespace and keep the healthchecks traffic # in another namespace. If NAME is not specified, then the default namespace # will be used. net_namespace_ipvs NAME # ipsets wasn't network namespace aware until Linux 3.13, and so # if running with an earlier version of the kernel, by default # use of ipsets is disabled if using a namespace and vrrp_ipsets # has not been specified. This options overrides the default and # allows ipsets to be used with a namespace on kernels prior to 3.13. namespace_with_ipsets # If multiple instances of keepalived are run in the same namespace, # this will create pid files with NAME as part of the file names, # in /run/keepalived. # Note: the instance name cannot be changed on a configuration reload instance NAME # Create pid files in /run/keepalived use_pid_dir # Poll to detect media link failure using ETHTOOL, MII or ioctl interface # otherwise uses netlink interface. linkbeat_use_polling # Time for main process to allow for child processes to exit on termination # in seconds. This can be needed for very large configurations. # (default: 5) child_wait_time SECS Note: All processes/scripts run by keepalived are run with parent death signal set to SIGTERM. All such processes/scripts should either not change the action for SIGTERM, or ensure that the process/script terminates once SIGTERM is received, possibly following any cleanup actions needed. # Global definitions configuration block global_defs { # In order to ensure that all processes read exactly the same configuration, # while the config is first read it is written, by default, to a memory based # file (or to an anonymous file in /tmp/ if memfd_create() is not supported). # If your configuration is very large, you may not want the copy to be # held in memory, in which case specifing the tmp_config_directory causes the # configuration to be written to an anonymous file on the filesystem on which # the specified directory resides, which must be writeable by keepalived. # This setting cannot be changed on a reload, and it should be specified as # early as possible in the configuration. tmp_config_directory DIRECTORY # config_save_dir causes keepalived to save configuration state and # configuration files before and after each reload. This is used for debugging # purposes if there appear to be problems related to repeated reloads. # The directory will be created if it does not exist, but all parent # directories must exist. config_save_dir DIRECTORY # Set the process names of the keepalived processes to the default values: # keepalived, keepalived_vrrp, keepalived_ipvs, keepalived_bfd process_names # Specify the individual process names process_name NAME vrrp_process_name NAME checker_process_name NAME bfd_process_name NAME # The startup and shutdown scripts are run once, when keepalived starts # before any child processes are run, and when keepalived stops after # all child processes have terminated, respectively. # The original motivation for adding this feature was that although # keepalived can setup IPVS configuration using firewall marks, there # was no mechanism for adding configuration to set the firewall marks # (or for removing it afterwards). # This feature can also be used to setup the iptables framework required # if using iptables (see vrrp_iptables option below), modify interface # settings, or anything else that can be done from a script or program. # Only one startup script and one shutdown script can be specified. # The timeouts (in seconds default 10 seconds) are the time allowed for # scripts to run; if the timeout expires the scripts will be killed (this # is to stop keepalived hanging waiting for the scripts to terminate). startup_script SCRIPT_NAME [username [groupname]] startup_script_timeout SECONDS # range [1,1000] shutdown_script SCRIPT_NAME [username [groupname]] shutdown_script_timeout SECONDS # range [1,1000] # Set of email To: notify notification_email { admin@example1.com ... } # email from address that will be in the header # (default: keepalived@<local host name>) notification_email_from admin@example.com # Remote SMTP server used to send notification email. # IP address or domain name with optional port number. # (default port number: 25) smtp_server 127.0.0.1 [<PORT>] # Name to use in HELO messages. # (default: local host name) smtp_helo_name <STRING> # SMTP server connection timeout in seconds. smtp_connect_timeout 30 # Sets default state for all smtp_alerts smtp_alert <BOOL> # Sets default state for vrrp smtp_alerts smtp_alert_vrrp <BOOL> # Sets default state for checker smtp_alerts smtp_alert_checker <BOOL> # Logs every failed real server check in syslog # (nevertheless, SMTP alert is only sent when all retry checks failed # and real server transitions to DOWN state) checker_log_all_failures <BOOL> # Don't send smtp alerts for fault conditions no_email_faults # String identifying the machine (doesn't have to be hostname). # (default: local host name) router_id <STRING> # Multicast Group to use for IPv4 VRRP adverts # Defaults to the RFC5798 IANA assigned VRRP multicast address 224.0.0.18 # which You typically do not want to change. vrrp_mcast_group4 224.0.0.18 # Multicast Group to use for IPv6 VRRP adverts # (default: ff02::12) vrrp_mcast_group6 ff02::12 # sets the default interface for static addresses. # (default: eth0) default_interface p33p1.3 # The sync daemon as provided by the IPVS kernel code only supports # one master and one backup daemon instance at a time to synchronize # the IPVS connection table. # See ipvsadm(8) man page for more details of the sync daemon. # Parameters are binding interface, and optional: # inst VRRP_INSTANCE (inst can be omitted for backward compatibility) # syncid (0 to 255) for lvs syncd, default is the VRID of vrrp instance, # or 0 if no vrrp instance # maxlen (1..65507) maximum packet length (limit is mtu - 20 - 8) # port (1..65535) UDP port number to use, default 8848 # ttl (1..255) # group - multicast group address(IPv4 or IPv6), default 224.0.0.81 # If VRRP_INSTANCE is not specified, both the master and backup sync daemons # will be run as long as keepalived is running, otherwise the sync daemon # master/backup state tracks the state of the specified vrrp instance: if # the vrrp instance is in master state, only the master sync daemon will run, # if the vrrp instance is not master, only the backup sync daemon will run. # NOTE: maxlen, port, ttl and group are only available on Linux 4.3 or later. # See kernel source doc/Documentation/networking/ipvs-sysctl.txt for details of # parameters controlling IPVS and the sync daemon. # /proc/net/ip_vs* provide some details about the state of IPVS. lvs_sync_daemon <INTERFACE> [[inst] <VRRP_INSTANCE>] [id <SYNC_ID>] \ [maxlen <LEN>] [port <PORT>] [ttl <TTL>] [group <IP ADDR>] # lvs_timeouts specifies the tcp, tcp_fin and udp connection tracking timeouts # in seconds. At least one value must be specified; not setting a value leaves # it unchanged from when keepalived started. lvs_timeouts [tcp SECS] [tcpfin SECS] [udp SECS] # flush any existing LVS configuration at startup lvs_flush # flush remaining LVS configuration at shutdown (for large configurations # this is much faster than the default approach of deleting each RS and # each VS individually). # If VS is specified, remove each keepalived managed virtual # server without explicitly removing the real servers (the kernel will # remove them). lvs_flush_on_stop [VS] # delay for second set of gratuitous ARPs after transition to MASTER. # in seconds, 0 for no second set. # (default: 5) vrrp_garp_master_delay 10 # number of gratuitous ARP messages to send at a time after # transition to MASTER. # (default: 5) vrrp_garp_master_repeat 1 # delay for second set of gratuitous ARPs after lower priority # advert received when MASTER. # (default: vrrp_garp_master_delay) vrrp_garp_lower_prio_delay 10 # Default value for vrrp down_timer_adverts. vrrp_down_timer_adverts [1:100] # number of gratuitous ARP messages to send at a time after # lower priority advert received when MASTER. # (default: vrrp_garp_master_repeat) vrrp_garp_lower_prio_repeat 1 # minimum time interval for refreshing gratuitous ARPs while MASTER. # in seconds (resolution seconds). # (default: 0 (no refreshing)) vrrp_garp_master_refresh 60 # number of gratuitous ARP messages to send at a time while MASTER # (default: 1) vrrp_garp_master_refresh_repeat 2 # Delay between gratuitous ARP messages sent on an interface # decimal, seconds (resolution usecs). # (default: 0) vrrp_garp_interval 0.001 # Delay between unsolicited NA messages sent on an interface # decimal, seconds (resolution usecs). # (default: 0) vrrp_gna_interval 0.000001 # By default keepalived sends 5 gratuitions ARP/NA messages at a # time, and after transitioning to MASTER sends a second block of # 5 messages 5 seconds later. # With modern switches this is unnecessary, so setting vrrp_min_garp # causes only one ARP/NA message to be sent, with no repeat 5 seconds # later. vrrp_min_garp [<BOOL>] # The following option causes periodic GARP/NA messages to be sent on # interfaces of VIPs/eVIPs that are not the interface of the VRRP # instance, in order to ensure that switch MAC caches are maintained # (specified in seconds). # Many switches have a default cache timeout of 300 seconds, and so # a garp repeat rate of 1/3rd of that would be sensible. The maximum # permitted value is 1 day (86400 seconds); # By default, it will only send on VMAC interfaces; specifying all # will cause it to send GARP/NA on each interface used by the VRRP instance. vrrp_garp_extra_if [all] 100 # If a lower priority advert is received, don't send another advert. # This causes adherence to the RFCs. Defaults to false, unless # strict_mode is set. vrrp_lower_prio_no_advert [<BOOL>] # If we are master and receive a higher priority advert, send an advert # (which will be lower priority than the other master), before we # transition to backup. This means that if the other master has # garp_lower_priority_repeat set, it will resend garp messages. # This is to get around the problem of their having been two simultaneous # masters, and the last GARP messages seen were from us. vrrp_higher_prio_send_advert [<BOOL>] # Set the default VRRP version to use # (default: 2, but IPv6 instances will use version 3) vrrp_version <2 or 3> # keepalived uses a firewall (either nftables or iptables) for two purposes: # i) To implement no_accept mode # ii) To stop IGMP/MLD/Router-Solicit packets being sent on VMAC interfaces, # and to move IGMP/MLD messages onto the underlying interface. # If both vrrp_iptables and vrrp_nftables are specified, keepalived will use # nftables and not iptables. Similarly, if the iptables command is generating # nftables configuration, or there is no iptables command installed, # keepalived will use nftables rather than iptables. # If neither vrrp_nftables or vrrp_iptables are specified but VMACs are in use # or no_accept is specified, keepalived will use nftables if it is available. # Use nftables as the firewall. # TABLENAME must not exist, and must be different for each # instance of keepalived running in the same network namespace. # Default tablename is keepalived, and priority is -1. # keepalived will create base chains in the table. # counters means counters are added to the rules (primarily for # debugging purposes). # ifindex means create IPv6 link local sets using ifindex rather # than ifnames. This is the default unless the vrrp_instance has # set dont_track_primary. The alternative is to use interface names # as part of the set key, but the nft utility prior to v0.8.3 will # then not output interface names properly. nftables [TABLENAME] nftables_priority PRIORITY nftables_counters nftables_ifindex # Similarly for IPVS iptables - used for setting fwmarks for virtual # server groups. keepalived will allocate a fwmark for each virtual # server group, so that only one virtual server for each group needs # to be configured in IPVS, by using a fwmark, and nftables will be # used to set the fwmark for each of the virtual server # address/protocol/port combinations specified. # nftables_ipvs_start_fwmark specifies the first fwmark for keepalived # to use (default 1000). This will be incremented for each subsequent # virtual server group. nftables_ipvs [TABLENAME] nftables_ipvs_priority PRIORITY nftables_ipvs_start_fwmark NUMBER # Use iptables as the firewall. # Note: it is necessary for the specified chain to exist in # the iptables and/or ip6tables configuration, and for the chain # to be called from an appropriate point in the iptables configuration. # It will probably be necessary to have this filtering after accepting # any ESTABLISHED,RELATED packets, because IPv4 might select the VIP as # the source address for outgoing connections. # Note: although the default chains that are used are INPUT and OUTPUT, # since those are the only chains that will always exist, it is not safe # or sensible to use those chains and specific chains should be created # and called from appropriate points in the iptables configuration. The # chains used for keepalived should not be used for any other purpose, and # should have no rules configured, other than the rules that keepalived # manages. # A startup_script (see above) can be used to create the chains and to # add rules to call them. A shutdown_script can be used to remove the # iptables configuration added by the startup_script. # Note2: If using ipsets, the iptables VIP rules are appended to the end # of the specified chains; if not using ipsets, the VIP rules are inserted # at the beginning of the chains. Any IGMP rules are always appended to # the end of the chains. # (default: INPUT) vrrp_iptables keepalived # or for outbound filtering as well # Note, outbound filtering won't work with IPv4, since the VIP can be # selected as the source address for an outgoing connection. With IPv6 # this is unlikely since the addresses are deprecated. vrrp_iptables keepalived_in keepalived_out # or to to use default chains (INPUT and OUTPUT) vrrp_iptables # Keepalived may have the option to use ipsets in conjunction with # iptables. If so, then the ipset names can be specified, defaults # as below. If no names are specified, ipsets will not be used, # otherwise any omitted names will be constructed by adding "_if" # and/or "6" and _igmp/_mld to previously specified names. vrrp_ipsets [keepalived [keepalived6 [keepalived_if6 [keepalived_igmp [keepalived_mld]]]]] # An alternative to moving IGMP messages from VMACs to their parent interfaces # is to disable them altogether in the kernel by setting # igmp_link_local_mcast_reports false. # This stops IGMP join etc messages for 224.0.0.0/24, since they should # always be forwarded to all interfaces (see RFC4541). # This is available from Linux 4.3 onwards. disable_local_igmp # The following enables checking that when in unicast mode, the # source address of a VRRP packet is one of our unicast peers. vrrp_check_unicast_src # Checking all the addresses in a received VRRP advert can be time # consuming. Setting this flag means the check won't be carried out # if the advert is from the same master router as the previous advert # received. # (default: don't skip) vrrp_skip_check_adv_addr # Enforce strict VRRP protocol compliance. This currently includes # enforcing the following. Please note that other checks may be # added in the future if they are found to be missing: # 0 VIPs not allowed # unicast peers not allowed # IPv6 addresses not allowed in VRRP version 2 # First IPv6 VIP is not link local # State MASTER can be configured if and only if priority is 255 # Authentication is not supported # Preempt delay is not supported # Accept mode cannot be set for VRRPv2 # If accept/no accept is not specified, accept is set if priority # is 255 aand cleared otherwise # Gratuitous ARP repeats cannot be enabled # Cannot clear lower_prio_no_advert # Cannot set higher_prio_send_advert # Cannot use vmac_xmit_base vrrp_strict # Send vrrp instance priority notifications on notify FIFOs. vrrp_notify_priority_changes <BOOL> # The following options can be used if vrrp, checker or bfd processes # are timing out. This can be seen by a backup vrrp instance becoming # master even when the master is still running, because the master or # backup system is too busy to process vrrp packets. # -- # keepalived can, if it detects that it is not running sufficiently # soon after a timer should expire, increase its priority, first # of all switching to realtime scheduling, and if that is not # sufficient, it will then increase its realtime priority by one each # time it detects a further delay in running. If the event that realtime # scheduling is enabled, RLIMIT_RTTIME will be set, using the values for # {bfd,checker,vrrp}_rlimit_rttime (see below). These values may need # to be increased for slower processors. # -- # To limit the maximum increased automatic priority, specify the following # (0 doesn't use automatic priority increases, and is the default. -1 disables # the warning message at startup). Omitting the priority sets the maximum value. max_auto_priority [<-1 to 99>] # 99 is really sched_get_priority_max(SCHED_RR) # Minimum delay in microseconds after timer expires before keeplalived is # scheduled after which the process priority will be auto incremented # (default is 1000000 usecs (1 second), maximum is 10000000 (10 seconds)) min_auto_priority_delay <delay in usecs> # Set the vrrp child process priority (Negative values increase priority) vrrp_priority <-20 to 19> # Set the checker child process priority checker_priority <-20 to 19> # Set the BFD child process priority bfd_priority <-20 to 19> # Set the vrrp child process non swappable vrrp_no_swap # Set the checker child process non swappable checker_no_swap # Set the BFD child process non swappable bfd_no_swap # The following options can be used to force vrrp, checker and bfd # processes to run on a restricted CPU set. # You can either bind processes to a single CPU or define a set of # cpu. In that last case Linux kernel will be restricted to that cpu # set during scheduling. Forcing process binding to single CPU can # increase performances on heavy loaded box. # INTEGER following configuration keyword are representing cpu_id # as shown in /proc/cpuinfo on line "processor:" # -- # Set CPU Affinity for the vrrp child process vrrp_cpu_affinity <INTEGER> [<INTERGER>]...[<INTEGER>] # Set CPU Affinity for the checker child process checker_cpu_affinity <INTEGER> [<INTERGER>]...[<INTEGER>] # Set CPU Affinity for the bfd child process bfd_cpu_affinity <INTEGER> [<INTERGER>]...[<INTEGER>] # Set the vrrp child process to use real-time scheduling # at the specified priority vrrp_rt_priority <1..99> # Set the checker child process to use real-time scheduling # at the specified priority checker_rt_priority <1..99> # Set the BFD child process to use real-time scheduling # at the specified priority bfd_rt_priority <1..99> # Set the limit on CPU time between blocking system calls, # in microseconds # (default: 10000) vrrp_rlimit_rttime >=2 checker_rlimit_rttime >=2 bfd_rlimit_rttime >=2 # If Keepalived has been build with SNMP support, the following # keywords are available. # Note: Keepalived, checker and RFC support can be individually # enabled/disabled # -- # Specify socket to use for connecting to SNMP master agent # (see source module keepalived/vrrp/vrrp_snmp.c for more details) # (default: unix:/var/agentx/master) snmp_socket udp:1.2.3.4:705 # enable SNMP handling of vrrp element of KEEPALIVED MIB enable_snmp_vrrp # enable SNMP handling of checker element of KEEPALIVED MIB enable_snmp_checker # enable SNMP handling of RFC2787 and RFC6527 VRRP MIBs enable_snmp_rfc # enable SNMP handling of RFC2787 VRRP MIB enable_snmp_rfcv2 # enable SNMP handling of RFC6527 VRRP MIB enable_snmp_rfcv3 # enable SNMP traps enable_traps # If Keepalived has been build with DBus support, the following # keywords are available. # -- # Enable the DBus interface enable_dbus # Name of DBus service # Useful if you want to run multiple keepalived processes with DBus enabled # (default: org.keepalived.Vrrp1) dbus_service_name SERVICE_NAME # Specify the default username/groupname to run scripts under. # If this option is not specified, the user defaults to keepalived_script # if that user exists, otherwise the uid/gid under which keepalived is running. # If groupname is not specified, it defaults to the user's group. script_user username [groupname] # Don't run scripts configured to be run as root if any part of the path # is writable by a non-root user. enable_script_security # Rather than using notify scripts, specifying a fifo allows more # efficient processing of notify events, and guarantees that they # will be delivered in the correct sequence. # NOTE: the FIFO names must all be different # -- # FIFO to write notify events to # See vrrp_notify_fifo and lvs_notify_fifo for format of output # For further details, see the description under vrrp_sync_group. # see doc/samples/sample_notify_fifo.sh for sample usage. notify_fifo FIFO_NAME [username [groupname]] # script to be run by keepalived to process notify events # The FIFO name will be passed to the script as the last parameter notify_fifo_script STRING|QUOTED_STRING [username [groupname]] # FIFO to write vrrp notify events to. # The string written will be a line of the form: INSTANCE "VI_1" MASTER 100 # and will be terminated with a new line character. # For further details of the output, see the description under vrrp_sync_group # and doc/samples/sample_notify_fifo.sh for sample usage. vrrp_notify_fifo FIFO_NAME [username [groupname]] # script to be run by keepalived to process vrrp notify events # The FIFO name will be passed to the script as the last parameter vrrp_notify_fifo_script STRING|QUOTED_STRING [username [groupname]] # FIFO to write notify healthchecker events to # The string written will be a line of the form: # VS [192.168.201.15]:tcp:80 {UP|DOWN} # RS [1.2.3.4]:tcp:80 [192.168.201.15]:tcp:80 {UP|DOWN} # and will be terminated with a new line character. lvs_notify_fifo FIFO_NAME [username [groupname]] # script to be run by keepalived to process healthchecher notify events # The FIFO name will be passed to the script as the last parameter lvs_notify_fifo_script STRING|QUOTED_STRING [username [groupname]] # By default, when keepalived reloads the vrrp instance and sync group states # are not written to the relevant FIFOs. Setting this option will cause the # states to be sent to the FIFO(s) when keepalived reloads. fifo_write_vrrp_states_on_reload # Allow configuration to include interfaces that don't exist at startup. # This allows keepalived to work with interfaces that may be deleted and restored # and also allows virtual and static routes and rules on VMAC interfaces. # allow_if_changes allows an interface to be deleted and recreated with a # different type or underlying interface, eg changing from vlan to macvlan # or changing a macvlan from eth1 to eth2. This is predominantly used for # reporting duplicate VRID errors at startup if allow_if_changes is not set. dynamic_interfaces [allow_if_changes] # The following options are only needed for large configurations, where either # keepalived creates a large number of interface, or the system has a large # number of interface. These options only need using if # "Netlink: Receive buffer overrun" messages are seen in the system logs. # If the buffer size needed exceeds the value in /proc/sys/net/core/rmem_max # the corresponding force option will need to be set. # -- # Set netlink receive buffer size. This is useful for # very large configurations where a large number of interfaces exist, and # the initial read of the interfaces on the system causes a netlink buffer # overrun. vrrp_netlink_cmd_rcv_bufs BYTES vrrp_netlink_cmd_rcv_bufs_force <BOOL> vrrp_netlink_monitor_rcv_bufs BYTES vrrp_netlink_monitor_rcv_bufs_force <BOOL> # The vrrp netlink command and monitor socket the checker command and # and monitor socket and process monitor buffer sizes can be independently set. # The force flag means to use SO_RCVBUFFORCE, so that the buffer size # can exceed /proc/sys/net/core/rmem_max. lvs_netlink_cmd_rcv_bufs BYTES lvs_netlink_cmd_rcv_bufs_force <BOOL> lvs_netlink_monitor_rcv_bufs BYTES lvs_netlink_monitor_rcv_bufs_force <BOOL> # As a guide for process_monitor_rcv_bufs for 1400 processes terminating # simultaneously, 212992 (the default on some systems) is insufficient, whereas # 500000 is sufficient. process_monitor_rcv_bufs BYTES process_monitor_rcv_bufs_force <BOOL> # When a socket is opened, the kernel configures the max rx buffer size for # the socket to /proc/sys/net/core/rmem_default. On some systems this can be # very large, and even generally this can be much larger than necessary. # This isn't a problem so long as keepalived is reading all queued data from # it's sockets, but if rmem_default was set sufficiently large, and if for # some reason keepalived stopped reading, it could consume all system memory. # The vrrp_rx_bufs_policy allows configuring of the rx bufs size when the # sockets are opened. If the policy is MTU, the rx buf size is configured # to the total of interface's MTU * vrrp_rx_bufs_multiplier for each vrrp # instance using the socket. Likewise, if the policy is ADVERT, then it is # the total of each vrrp instances advert packet size * multiplier. # (default: use system default) vrrp_rx_bufs_policy [MTU|ADVERT|NUMBER] # (default: 3) vrrp_rx_bufs_multiplier NUMBER # Send notifies at startup for real servers that are starting up rs_init_notifies # Don't send an email every time a real server checker changes state; # only send email when a real server is added or removed no_checker_emails # The umask to use for creating files. The number can be specified in hex, octal # or decimal. BITS are I{R|W|X}{USR|GRP|OTH}, e.g. IRGRP, separated by '|'s. # The default umask is IWGRP | IWOTH. This option cannot override the # command-line option. umask [NUMBER|BITS] # On some systems when bond interfaces are created, they can start passing traffic # and then have a several second gap when they stop passing traffic inbound. This # can mean that if keepalived is started at boot time, i.e. at the same time as # bond interfaces are being created, keepalived doesn't receive adverts and hence # can become master despite an instance with higher priority sending adverts. # This option specifies a delay in seconds before vrrp instances start up after # keepalived starts, vrrp_startup_delay 5.5 # The following will cause logging of receipt of VRRP adverts for VRIDs not configured # on the interface on which they are received. log_unknown_vrids # Specify the prefix for generated VMAC names (default "vrrp") vmac_prefix STRING # Specify the prefix for generated VMAC names for VIPs which use a VMAC but are not # on the VRRP instance's interface (default vmac_prefix value) vmac_addr_prefix STRING # Specify random seed for ${_RANDOM}, to make configurations repeatable (default # is to use a seed based on the time, so that each time a different configuration # will be generated). random_seed UNSIGNED_INT # If a configuration reload is attempted with an updated configuration file that has # errors, keepalived may terminate, and possibly enter a loop indefinitely restarting # and terminating. If reload_check_config is set, then keepalived will attempt to # validate the configuration before initiating a reload, and only initiate the reload # if the configuration is valid. reload_check_config [LOG_FILE] # Treat any missing include file as an error. The OPTIONS can be any combination of # readable - error if a match is not a readable file # match - error if no file matches (unless wildcard specified) # wildcard_match - error if no file matches (even if wildcard specified) # brace_match - error if a brace expansion does not match a file # Note: match, wildcard_match and brace_match include the readable check. # The setting of include_check is saved when a new include file is opened, and restored # when the file is closed. This means that the include_check setting when reading a # file cannot be changed by a subsequently included file. To change the setting for all # included files, include_check should be set at the beginning of the configuration file # specified in the command line (default /etc/keepalived/keepalived.conf). # Note2: If the libc glob() function does not support GLOB_ALTDIRFUNC (e.g. Musl libc as # on Alpine Linux etc.), then only readable and wildcard_match of the above options will work. # It is possible to add or remove individual settings; '+' means add the following # checks, '-' means remove the following checks. For example # include_check +match -wildcard_match # adds the requirement that there is a matching file, and removes the requirement for # wildcard matches. # If no option is specified, it is the same as specifying all options. include_check [OPTIONS] # reload_time_file allows a reload of keepalived to be scheduled in the future. This is # particularly useful if there is a master keepalived and one or more backup keepalived # instances and the new configuration is incompatible with the previous configuration, # e.g. adding or removing VIPs which would cause adverts to be rejected. # All the instances can be scheduled to reload at the same time, thereby ensuring that # no mismatching adverts are received by the backup instances. # The configuration specifies a file which keepalived will monitor. The first line of # the file must contain a valid time or date/time exactly in the formats specified below. # When keepalived starts up, it reads the file if it exists, and schedules a reload at # the specified time. If the file does not exist, then when it is subsequently created # a reload will be scheduled. If the file is updated, the reload time will be modified # accordingly. If the file is deleted, the reload is cancelled. # Normally when the reload occurs the specified file is deleted, since the reload has # been done; if the file included a date then the reload will be in the past and so # ignored. However, if there is no date, then if the file were reread following the # reload, a reload would be scheduled for 24 hours time. In order to stop this, the # file is deleted (unlinked) by default. If reload_repeat is specified, then the # file is not deleted, and if the file contains a time only with no date, then # keepalived will keep reloading at that time every day until the file is removed or # modified. # If the directory containing the file does not exist at startup/reload, or if the # directory is removed or renamed, then no future scheduled reloads will occur until # a manual (SIGHUP) reload is done or keepalived restarts. # The permitted formats of the entry in the timer file are precisely: # HH:MM:SS # YY-MM-DD HH:MM:SS # YYYY-MM-DD HH:MM:SS # each with an optional 'Z' at the end. # There must be no leading or trailing whitespace, and only one space between the date # and the time. # If there is a 'Z' at the end of the time, the time is parsed as UTC, otherwise the # time is the localtime for the environment in which keepalived is running. If the # systems which are being reloaded are in different timezones, it is probably safer to # use UTC. # If using local time with daylight savings, beware that some times don't exist and # some times are duplicated and hence ambiguous. reload_time_file ABSOLUTE-PATHNAME-OF-FILE reload_repeat # Some users frequently update their configurations and reload keepalived. reload_file # provides a mechanism that allows the configuration update processes not to update the # configuration files while keepalived is reading them. # The reload file will be created by keepalived before it starts reading configuration # files, unless the file exists. If the file already exists, it will be truncated. Once # keepalived has completed reading the files it will remove the reload file. # If reload_file with no file name is specified, the default filename keepalived.reload # in the PID directory will be used. # The best way to use the reload file is for the configuration update process to touch # the reload file before it signals keepalived to reload, and then wait for the file # to be deleted, which indicates that keepalived has finished reading the config files. # When keepalived starts reading the configuration files, since it truncates the reload # file, if update process creates the reload_file with non-zero size, it can detect # the reloading starting by the reload_file becoming zero length. reload_file [ABSOLUTE-PATHNAME-OF-FILE] # Sending SIGUSR1 to keepalived causes it to dump its data structures # for debugging purposes, although some users use this feature and # process the output. Please note that the format of the .data files # produced is not guaranteed to maintain backward compatibility. # The standard file names are keepalived_parent.data, keepalived.data, # keepalived_check.data and keepalived_bfd.data. This causes a problem # if more than one keepalived instance is running on a system. # In order to alleviate this, enabling data_use_instance includes the # instance name and network namespace in the file name of the .data files. data_use_instance [<BOOL>] } Linkbeat interfaces The linkbeat_interfaces block allows specifying which interfaces should use polling via MII, Ethtool or ioctl status rather than rely on netlink status updates. This allows more granular control of global definition linkbeat_use_polling. This option is preferred over the deprecated use of linkbeat_use_polling in a vrrp_instance block, since the latter only allows using linkbeat on the interface of the vrrp_in- stance itself, whereas track_interface and vir- tual_ipaddresses and virtual_iproutes may require monitoring other in- terfaces, which may need to use linkbeat polling. The default polling type to use is MII, unless that isn't supported in which case ETHTOOL is used, and if that isn't supported then ioctl polling. The preferred type of polling to use can be specified with MII or ETHTOOL or IOCTL after the interface name, but if that type isn't supported, a supported type will be used. The syntax for linkbeat_interfaces is: linkbeat_interfaces { eth2 enp2s0 ETHTOOL } Static track groups Static track groups are used to allow vrrp instances to track static addresses, routes and rules. If a static address/route/rule specifies a track group, then if the address/route/rule is deleted and cannot be restored, the vrrp instance will transition to fault state. The syntax for a track group is: track_group GROUP1 { group { VI_1 VI_2 } } Static routes/addresses/rules Keepalived can configure static addresses, routes, and rules. These ad- dresses, routes and rules are NOT moved by vrrpd, they stay on the machine. If you already have IPs and routes on your ma- chines and your machines can ping each other, you don't need this sec- tion. The syntax for rules and routes is the same as for ip rule add/ip route add (except shortened option names are not supported due to ambiguities). The track_group specification refers to a named track_group which lists the vrrp instances which will track the ad- dress, i.e. if the address is deleted the vrrp instances will transi- tion to backup. NOTE: since rules without preferences can be added in different orders due to vrrp instances transitioning from master to backup etc, rules need to have a preference. If a preference is not specified, keepalived will assign one, but it will probably not be what you want. The syntax is the same for virtual addresses and virtual routes. If no dev element is specified, it defaults to default_interface (default eth0). Note: the broadcast address may be specified as '-' or '+' to clear or set the host bits of the address. If a route or rule could apply to either IPv4 or IPv6 it will default to IPv4. To force a route/rule to be IPv6, add the keyword "inet6". By default keepalived prepends routes (the kernel's default) which adds the route before any matching routes (this is the same behaviour as the (undocumented) as the 'ip route add' command, which only adds the route if there is no matching route. If 'append' is specified, the behaviour is the same as the 'ip route append' command, i.e. the route is added after any matching route. Note: the rules for whether a route matches differ between IPv4 and IPv6; for example specifying a different proto means a matching route can be prepended/appended for IPv4 but not for IPv6. If in doubt, test it using the 'ip route add/prepend/append' com- mands. static_ipaddress { <IPADDR>[/<MASK>] [brd <IPADDR>] [dev <STRING>] [scope <SCOPE>] [label <LABEL>] [peer <IPADDR>] [home] [-nodad] [mngtmpaddr] [noprefixroute] [autojoin] [track_group GROUP] [preferred_lft nn|forever] 192.168.1.1/24 dev eth0 scope global ... } static_routes { 192.168.2.0/24 via 192.168.1.100 dev eth0 track_group GROUP1 192.168.100.0/24 table 6909 nexthop via 192.168.101.1 dev wlan0 onlink weight 1 nexthop via 192.168.101.2 dev wlan0 onlink weight 2 192.168.200.0/24 dev p33p1.2 table 6909 tos 0x04 protocol bird scope link priority 12 mtu 1000 hoplimit 100 advmss 101 rtt 102 rttvar 103 reordering 104 window 105 cwnd 106 ssthresh lock 107 realms PQA/0x14 rto_min 108 initcwnd 109 initrwnd 110 vrf blue features ecn add 2001:470:69e9:1:2::4 dev p33p1.2 table 6909 tos 0x04 protocol bird scope link priority 12 mtu 1000 hoplimit 100 advmss 101 rtt 102 rttvar 103 reordering 104 window 105 cwnd 106 ssthresh lock 107 rto_min 108 initcwnd 109 append initrwnd 110 features ecn fastopen_no_cookie 1 ... } static_rules { from 192.168.2.0/24 table 1 track_group GROUP1 to 192.168.2.0/24 table 1 from 192.168.28.0/24 to 192.168.29.0/26 table small iif p33p1 oif wlan0 tos 22 fwmark 24/12 preference 39 realms 30/20 goto 40 to 1:2:3:4:5:6:7:0/112 from 7:6:5:4:3:2::/96 table 6908 uidrange 10000-19999 to 1:2:3:4:6:6:7:0/112 from 8:6:5:4:3:2::/96 l3mdev protocol 12 ip_proto UDP sport 10-20 dport 20-30 ... } Track files Adds a file to be monitored. The file will be read whenever it is modi- fied. The value in the file will be recorded for all VRRP instances, sync groups and real servers which monitor it. Note that the file will only be read if at least one VRRP instance, sync group or real server monitors it. A value will be read as a number in text from the file. If the weight configured against the track_file is 0, a non-zero value in the file will be treated as a failure status, and a zero value will be treated as an OK status, otherwise the value will be multiplied by the weight configured in the track_file statement. For VRRP instances, if the result is less than -253 anything monitoring the script will transition to the fault state (the weight can be 254 to allow for a negative value being read from the file). If the vrrp instance or sync group is not the address owner and the re- sult is between -253 and 253, the result will be added to the initial priority of the VRRP instance (a negative value will reduce the prior- ity), although the effective priority will be limited to the range [1,254]. Likewise for real servers. If a vrrp instance using a track_file is a member of a sync group, un- less sync_group_tracking_weight is set on the group weight 0 must be set. Likewise, if the vrrp instance is the address owner, weight 0 must also be set. For real servers monitoring the file, the limits of values read from the track file are 2147483648 to -2147483648. The value, once multi- plied by the weight, will be added to the real server's IPVS weight. If the result is <= than 2147483648 then the checker will be in the FAULT state. NOTE: weights for track_file for real servers are not fully implemented yet. In particular allowing weight 0, handling negative calculated val- ues and reloading. The syntax for track file is: track_file <STRING> { # vrrp_track_file is a deprecated synonym # file to track (weight defaults to 1) file <QUOTED_STRING> # optional default weight weight <-2147483647..2147483647> [reverse] # create the file and/or initialise the value # This causes VALUE (default 0) to be written to # the specified file at startup if the file doesn't # exist, unless overwrite is specified in which case # any existing file contents will be overwritten with # the specified value. init_file [VALUE] [overwrite] } VRRP track processes The configuration block looks like: vrrp_track_process <STRING> { # process to monitor (with optional parameters) # A quoted string is treated as a single element, so if the first item # after the process keyword is quoted, that will be the command name. # For example: # process "/tmp/a b" param1 "param 2" # would mean a process named '/tmp/a b' (quotes removed) with 2 parameters # 'param1' and 'param 2'. process <STRING>|<QUOTED_STRING> [<STRING>|<QUOTED_STRING> ...] # If matching parameters, this specifies a partial match (i.e. the first # n parameters match exactly), or an initial match, i.e. the last # parameter may be longer that the parameter configured. # To specify that a command must have no parameters, don't specify # any parameters, but specify param_match. param_match {initial|partial} # default weight (default is 1). For description of reverse, see track_process. # 'weight 0 reverse' will cause the vrrp instance to be down when the # quorum is up, and vice versa. weight <-254..254> [reverse] # minimum number of processes for success quorum NUM # maximum number of processes for success. For example, setting # this to 1 would cause a failure if two instances of the process # were running (but beware forks - see fork_delay below). # Setting this to 0 would mean failure if the matching process were # running at all. quorum_max NUM # time to delay after process quorum gained after fork before # consider process up (in fractions of second) # This is to avoid up/down bounce for fork/exec fork_delay SECS # time to delay after process quorum lost before # consider process down (in fractions of second) # This is to avoid down/up bounce after terminate/parent refork. terminate_delay SECS # this sets fork_delay and terminate_delay delay SECS # Normally process string is matched against the process name, # as shown on the Name: line in /proc/PID/status, unless # parameters are specified. # This option forces matching the full command line full_command } To avoid having to frequently run a track_script to monitor the exis- tance of processes (often haproxy or nginx), vrrp_track_process can monitor whether other processes are running. One difference from pgrep is track_process doesn't do a regular expres- sion match of the command string, but does an exact match. 'pgrep ssh' will match an sshd process, this track_process will not (it is equiva- lent to pgrep "^ssh$"). If full_command is used (equivalent to pgrep -f), /proc/PID/cmdline is used, but any updates to cmdline will not be detected (a process shouldn't normally change it, although it is possible with great care, for example systemd). Prior to Linux v3.2 track_process will not support detection of changes to a process name, since the kernel did not notify changes of process name prior to 3.2. Most processes do not change their process name, but, for example, firefox forks processes that change their process name to "Web Content". The process name referred to here is the con- tents of /proc/PID/comm. Quorum is the number of matching processes that must be run for an OK status. Delay might be useful if it anticipated that a process may be reloaded (stopped and restarted), and it isn't desired to down and up a vrrp in- stance. A positive weight means that an OK status will add <weight> to the priority of all VRRP instances which monitor it. On the opposite, a negative weight will be subtracted from the initial priority in case of insufficient processes. If the vrrp instance or sync group is not the address owner and the re- sult is between -253 and 253, the result will be added to the initial priority of the VRRP instance (a negative value will reduce the prior- ity), although the effective priority will be limited to the range [1,254]. If a vrrp instance using a track_process is a member of a sync group, unless sync_group_tracking_weight is set on the group weight 0 must be set. Likewise, if the vrrp instance is the address owner, weight 0 must also be set. Rational for not using pgrep/pidof/killall and the likes: Every time pgrep or its equivalent is run, it iterates though the /proc/[1-9][0-9]* directories, and opens the status and cmdline pseudo files in each directory. The cmdline pseudo file is mapped to the process's address space, and so if that part of the process is swapped out, it will have to be fetched from the swap space. pgrep etc also include zombie processes whereas keepalived does not, since they aren't running. This implementation only iterates though /proc/[1-9][0-9]*/ directories at start up, and it won't even read the cmdline pseudo files if 'full_command' is not specified for any of the vrrp_track_process en- tries. After startup, it uses the process_events kernel <-> userspace connector to receive notification of process changes. If full_command is specified for any track_process instance, the cmdline pseudo file will have to be read upon notification of the creation of the new process, but at that time it is very unlikely that it will have already been swapped out. On a busy system with a high number of process creations/terminations, using a track_script with pgrep/pidof/killall may be more efficient, although those processes are inefficient compared to the minimum that keepalived needs. Using pgrep etc on a system that is swapping can have a significant detrimental impact on the performance of the system, due to having to fetch swapped memory from the swap space, thereby causing additional swapping. BFD CONFIGURATION This is an implementation of RFC5880 (Bidirectional forwarding detec- tion), and this can be configured to work between 2 keepalived in- stances, but using unweighted track_bfds between a master/backup pair of VRRP instances means that the VRRP instance will only be able to come up if both VRRP instance are running, which somewhat defeats the purpose of VRRP. This implementation has been tested with OpenBFDD (available at https://github.com/dyninc/OpenBFDD). The syntax for bfd instance is : bfd_instance <STRING> { # BFD Neighbor IP (synonym neighbour_ip) neighbor_ip <IP ADDRESS> # Source IP to use (optional, except in order to ensure that the # local port is valid, it is required) source_ip <IP ADDRESS> # Required min RX interval, in ms (resolution is micro-seconds e.g. 3.312) # (default is 10 ms) min_rx <DECIMAL> # Desired min TX interval, in ms (resolution is micro-seconds) # (default is 10 ms) min_tx <DECIMAL> # Desired idle TX interval, in ms (resolution is micro-seconds) # (default is 1000 ms) idle_tx <DECIMAL> # Number of missed packets after # which the session is declared down # (default is 5) multiplier <INTEGER> # Operate in passive mode (default is active) passive # outgoing IPv4 ttl to use (default 255) ttl <INTEGER> # outgoing IPv6 hoplimit to use (default 64) hoplimit <INTEGER> # maximum reduction of ttl/hoplimit # in received packet (default 0) # (255 disables hop count checking) max_hops <INTEGER> # Default tracking weight # Normally, positive weights are added to the vrrp instance priority when # the bfd instance is up, negative weights reduce the priority when it is down. # However, if reverse is specified, the priority is decreased when up and # increased when down. 'weight 0 reverse' will cause the vrrp instance to be down # when the bfd instance is up, and vice versa. weight <-253:253> [reverse] # Normally bfd event notifications are sent to both the VRRP and checker processes. # Specifying vrrp or checker will cause event notifications for this bfd_instance # only to be sent to the specified process vrrp checker } VRRPD CONFIGURATION contains subblocks of VRRP script(s), VRRP synchronization group(s), VRRP gratuitous ARP and unsolicited neighbour advert delay group(s) and VRRP instance(s) VRRP script(s) The script will be executed periodically, every <interval> sec- onds. Its exit code will be recorded for all VRRP instances which moni- tor it. Note that the script will only be executed if at least one VRRP instance monitors it. The default weight equals 0, which means that any VRRP instance moni- toring the script will transition to the fault state after <fall> consecutive failures of the script. After that, <rise> consecu- tive successes will cause VRRP instances to leave the fault state, un- less they are also in the fault state due to other scripts or inter- faces that they are tracking. A positive weight means that <rise> successes will add <weight> to the priority of all VRRP instances which monitor it. On the opposite, a negative weight will be subtracted from the initial priority in case of <fall> failures. The syntax for the vrrp script is: # Adds a script to be executed periodically. Its exit code will be # recorded for all VRRP instances and sync groups which are monitoring it. vrrp_script <SCRIPT_NAME> { # path of the script to execute script <STRING>|<QUOTED-STRING> # seconds between script invocations, (default: 1 second) interval <INTEGER> # seconds after which script is considered to have failed timeout <INTEGER> # adjust priority by this weight, (default: 0) # For description of reverse, see track_script. # 'weight 0 reverse' will cause the vrrp instance to be down when the # script is up, and vice versa. weight <INTEGER:-253..253> [reverse] # required number of successes for OK transition rise <INTEGER> # required number of successes for KO transition fall <INTEGER> # user/group names to run script under. # group default to group of user user USERNAME [GROUPNAME] # assume script initially is in failed state init_fail } VRRP synchronization group(s) VRRP Sync Group is an extension to VRRP protocol. The main goal is to define a bundle of VRRP instance to get synchronized together so that transition of one instance will be reflected to others group members. In addition there is an enhanced notify feature for fine state transi- tion catching. You can also define multiple track policy in order to force state tran- sition according to a third party event such as interface, scripts, file, BFD. Important: for a SYNC group to run reliably, it is vi- tal that all instances in the group are MASTER or that they are all ei- ther BACKUP or FAULT. A situation with half instances having higher priority on machine A half others with higher priority on machine B will lead to constant re-elections. For this reason, when instances are grouped, any track scripts/files configured against member VRRP in- stances will have their tracking weights automatically set to zero, in order to avoid inconsistent priorities across instances. The syntax for vrrp_sync_group is : vrrp_sync_group <STRING> { group { # name of the vrrp_instance (see below) # Set of VRRP_Instance string <STRING> <STRING> ... } # Synchronization group tracking interface, script, file & bfd will # update the status/priority of all VRRP instances which are members # of the sync group. # 'weight 0 reverse' will cause the vrrp instance to be down when the # interface is up, and vice versa. track_interface { eth0 eth1 eth2 weight <-253..253> [reverse] ... } # add a tracking script to the sync group (<SCRIPT_NAME> is the name # of the vrrp_script entry) go to FAULT state if any of these go down # if unweighted. # reverse causes the direction of the adjustment of the priority to be reversed. track_script { <SCRIPT_NAME> <SCRIPT_NAME> weight <-253..253> [reverse|noreverse] } # Files whose state we monitor, value is added to effective priority. # <STRING> is the name of a track_file # weight defaults to weight configured in track_file track_file { <STRING> <STRING> weight <-254..254> [reverse|noreverse] ... } # Process to monitor, weight is added to effective priority. # <STRING> is the name of a vrrp_track_process # weight defaults to weight configured in vrrp_track_process. # See vrrp_instance track_process for description of weight. track_process { <STRING> <STRING> weight <-254..254> [reverse|noreverse] ... } # BFD instances we monitor, value is added to effective priority. # <STRING> is the name of a BFD instance track_bfd { <STRING> <STRING> <STRING> weight <INTEGER: -253..253> [reverse|noreverse] ... } # notify scripts and alerts are optional # # filenames of scripts to run on transitions can be unquoted (if # just filename) or quoted (if it has parameters) # The username and groupname specify the user and group # under which the scripts should be run. If username is # specified, the group defaults to the group of the user. # If username is not specified, they default to the # global script_user and script_group # to MASTER transition notify_master /path/to_master.sh [username [groupname]] # to BACKUP transition notify_backup /path/to_backup.sh [username [groupname]] # FAULT transition notify_fault "/path/fault.sh VG_1" [username [groupname]] # executed when stopping vrrp notify_stop <STRING>|<QUOTED-STRING> [username [groupname]] # notify_deleted causes DELETED to be sent to notifies rather # than the default FAULT after a vrrp instance is deleted during a # reload. If a script is specified, that script will be executed # as well. notify_deleted [<STRING>|<QUOTED-STRING> [username [groupname]]] # for ANY state transition. # "notify" script is called AFTER the notify_* script(s) and # is executed with 4 additional arguments after the configured # arguments provided by Keepalived: # $(n-3) = "GROUP"|"INSTANCE" # $(n-2) = name of the group or instance # $(n-1) = target state of transition (stop only applies to instances) # ("MASTER"|"BACKUP"|"FAULT"|"STOP"|"DELETED") # $(n) = priority value # $(n-3) and $(n-1) are ALWAYS sent in uppercase, and the possible # strings sent are the same ones listed above # ("GROUP"/"INSTANCE", "MASTER"/"BACKUP"/"FAULT"/"STOP"/"DELETED") # (note: DELETED is only applicable to instances) notify <STRING>|<QUOTED-STRING> [username [groupname]] # The notify fifo output is the same as the last 4 parameters for the "notify" # script, with the addition of "MASTER_RX_LOWER_PRI" instead of state for an # instance, and also "MASTER_PRIORITY" and "BACKUP_PRIORITY" if the priority # changes and notify_priority_changes is configured. # MASTER_RX_LOWER_PRI is used if a master needs to set some external state, such # as setting a secondary IP address when using Amazon AWS; if another keepalived # has transitioned to master due to a communications break, the lower priority # instance will have taken over the secondary IP address, and the proper master # needs to be able to restore it. # Send FIFO notifies for vrrp priority changes notify_priority_changes <BOOL> # Send email notification during state transition, # using addresses in global_defs above (default no, # unless global smtp_alert/smtp_alert_vrrp set) smtp_alert <BOOL> # DEPRECATED. Use track_interface, track_script and # track_file on vrrp_sync_groups instead. global_tracking # allow sync groups to use differing weights. # This probably WON'T WORK, but is a replacement for # global_tracking in case different weights were used # across different vrrp instances in the same sync group. sync_group_tracking_weight } VRRP gratuitous ARP and unsolicited neighbour advert delay group(s) specifies the setting of delays between sending gratuitous ARPs and un- solicited neighbour advertisements. This is intended for when an up- stream switch is unable to handle being flooded with ARPs/NAs. Use interface when the limits apply on the single physical interface. Use interfaces when a group of interfaces are linked to the same switch and the limits apply to the switch as a whole. Note: Only one of interface or interfaces should be used per block. If the global vrrp_garp_interval and/or vrrp_gna_interval are set, any interfaces that aren't specified in a garp_group will inherit the global settings. The syntax for garp_group is : garp_group { # Sets the interval between Gratuitous ARP (in seconds, resolution microseconds) garp_interval <DECIMAL> # Sets the default interval between unsolicited NA (in seconds, resolution microseconds) gna_interval <DECIMAL> # The physical interface to which the intervals apply interface <STRING> # A list of interfaces across which the delays are aggregated. interfaces { <STRING> <STRING> ... } } VRRP instance(s) A VRRP Instance is the VRRP protocol key feature. It defines and con- figures VRRP behaviour to run on a specific interface. Each VRRP In- stance is related to a unique interface. The syntax for vrrp_instance is : vrrp_instance <STRING> { # Initial state, MASTER|BACKUP # If the priority is 255, then the instance will transition immediately # to MASTER if state MASTER is specified; otherwise the instance will # wait between 3 and 4 advert intervals before it can transition, # depending on the priority. state MASTER # interface for inside_network, bound by vrrp. # Note: if using unicasting, the interface can be omitted as long # as the unicast addresses are not IPv6 link local addresses (this is # necessary, for example, if using asymmetric routing). # If the interface is omitted, then all VIPs and eVIPs should specify # the interface they are to be configured on, otherwise they will be # added to the default interface. interface eth0 # If using unicasting without specifying an interface, the VRF to operate # in can be specified. vrf VRF_IF # Use VRRP Virtual MAC (macvlan). # The macvlan will be created on the configured interface for # the VRRP instance, and the VIPs, and eVIPs of the matching address # family, which do not specify a different interface will be configured # on the macvlan. # The VRRP adverts will also be sent and received on the macvlan # interface, unless vmac_xmit_base is configured. # NOTE: If sysctl net.ipv4.conf.all.rp_filter is set, # and this vrrp_instance is an IPv4 instance, using # this option will cause the individual interfaces to be # updated to the greater of their current setting, and # all.rp_filter, as will default.rp_filter, and all.rp_filter # will be set to 0. # The original settings are restored on termination. # NOTE 2: If using use_vmac with unicast peers, # vmac_xmit_base must be set. # The MAC address can be specified with only 5 octets, in which case # the virtual_router_id will be used as the last octet. use_vmac [<VMAC_INTERFACE_NAME>] [MAC_ADDRESS] # use_vmac_addr is used to create VMAC (macvlan) interfaces for # each interface that is used by a VIP or eVIP where the interface # is not the same as the interface on which the VRRP instance is # configured or the eVIP's address family does not match the VRRP # instance's. Alternatively, use_vmac can be specified against each # VIP/eVIP that specifies an interface (dev). # NOTE: if use_vmac is specfied and an eVIP is not the same address # family as the vrrp instance, unless use_vmac_addr is specified, or # use_vmac is specified for the eVIP, the eVIP will be configured on # the vrrp instance's VMAC, which will have the wrong MAC address for # the address family of the eVIP. use_vmac_addr # Send/Recv VRRP messages from base interface instead of # VMAC interface vmac_xmit_base # Use IPVLAN interface. keepalived will create a mode L2 # ipvlan interface on top of the specified interface. # For IPv4 instances, an IP address is required, for IPv6 # the address is optional, in which case the link local # address will be used. # The mode flags default to bridge. NOTE: the mode flags must be the # same for all ipvlans on the same underlying interface. # It is safer to configure an interface name, in case keepalived crashes # and restarts, in which case it can more reliably find a previously # created interface. use_ipvlan [<INTERFACE_NAME>] [IP_ADDRESS] [bridge|private|vepa] # force instance to use IPv6 (this option is deprecated since # the virtual ip addresses determine whether IPv4 or IPv6 is used). native_ipv6 # Ignore VRRP interface faults (default unset). # Note: when using IPv6, setting the interface administratively down, e.g. # 'ip link set IF down' will by default cause all IPv6 addresses to be # deleted from the interface, and consequently the VRRP instance will # go to fault state due to the addresses being deleted. Setting sysctl # net.ipv6.conf.IF.keep_addr_on_down to 1 will allow non link-local addresses # to remain when the interface is downed. dont_track_primary # optional, monitor these as well. # go to FAULT state if any of these go down if unweighted. # When a weight is specified in track_interface, instead of setting the vrrp # instance to the FAULT state in case of failure, its priority will be # increased by the weight when the interface is up (for positive weights), # or decreased by the weight's absolute value when the interface is down # (for negative weights), unless reverse is specified, in which case the # direction of adjustment of the priority is reversed. # The weight must be comprised between -253 and +253 inclusive. # 0 is the default behaviour which means that a failure implies a # FAULT state. The common practice is to use positive weights to count a # limited number of good services so that the server with the highest count # becomes master. Negative weights are better to count unexpected failures # among a high number of interfaces, as it will not saturate even with high # number of interfaces. Use reverse to increase priority if an interfaces is down track_interface { eth0 eth1 eth2 weight <-253..253> [reverse] ... } # add a tracking script to the interface # (<SCRIPT_NAME> is the name of the vrrp_track_script entry) # The same principle as track_interface can be applied to track_script entries, # except that an unspecified weight means that the default weight declared in # the script will be used (which itself defaults to 0). # reverse causes the direction of the adjustment of the priority to be reversed. track_script { <SCRIPT_NAME> <SCRIPT_NAME> weight <-253..253> [reverse|no_reverse] } # Files whose state we monitor, value is added to effective priority. # <STRING> is the name of a track_file track_file { <STRING> <STRING> <STRING> weight <-254..254> [reverse|noreverse] ... } # Positive weights are added/subtracted when the process is running, # negative weights are subtracted/added when the not running. # If reverse is specified, the addition/subtraction is reversed. # <STRING> is the name of a vrrp_track_process # weight defaults to weight configured in vrrp_track_process track_process { <STRING> <STRING> weight <-254..254> [reverse|noreverse] ... } # BFD instances we monitor, value is added to effective priority, # unless reverse is specified, when the value is subtracted. # Positive weights are add/subtracted when the bfd instance is up, # negative weights are subtracted/added when the bfd instance is down. # <STRING> is the name of a BFD instance track_bfd { <STRING> <STRING> <STRING> weight <INTEGER: -253..253> [reverse|noreverse] ... } # default IP for binding vrrpd is the primary IP # on interface. If you want to hide the location of vrrpd, # use this IP as src_addr for multicast or unicast vrrp # packets. (since it's multicast, vrrpd will get the reply # packet no matter what src_addr is used). # optional mcast_src_ip <IPADDR> unicast_src_ip <IPADDR> # specify an alternative multicast address to use as the destination # of VRRP adverts and for listening for adverts. Note, if you are using # multiple VRRP instances with VMACs and different multicast addresses # and the same VRID, you will have to specify alternative MAC addresses # for at least all but one of the VMACs. # IPv6 multicast addresses must be link-local, i.e. start ffX2: # Using different multicast addresses with IPv6 on the same interface without # using VMACs is only supported if the kernel supports IPV6_MULTICAST_ALL # (from Linux v4.20). mcast_dst_ip <MULTICAST_IPADDR> # if the configured src_ip doesn't exist or is removed put the # instance into fault state track_src_ip # VRRP version to run on interface # default is global parameter vrrp_version, but IPv6 instances will # always use version 3. version <2 or 3> # The following enables checking that when in unicast mode, the # source address of a VRRP packet is one of our unicast peers. check_unicast_src # Do not send VRRP adverts over a VRRP multicast group. # Instead it sends adverts to the following list of # ip addresses using unicast. It can be cool to use # the VRRP FSM and features in a networking # environment where multicast is not supported! # IP addresses specified can be IPv4 as well as IPv6. # If min_ttl and/or max_ttl are specified, the TTL/hop limit # of any received packet is checked against the specified # TTL range, and is discarded if it is outside the range. # Specifying min_ttl or max_ttl turns on check_unicast_src. unicast_peer { <IPADDR> [min_ttl {0..255}] [max_ttl {0..255}] ... } # It is not possible to operate in unicast mode without any peers. # Until v2.2.4 keepalived would silently operate in multicast mode # if no peers were specified but a unicast keyword had been specified. # Using this keywork stops defaulting to multicast if no peers are # specified and puts the VRRP instance into fault state. unicast_fault_no_peer # Specify the unicast TTL/HLIM for sending unicast adverts unicast_ttl {0..255} # The checksum calculation when using VRRPv3 changed after v1.3.6. # Setting this flag forces the old checksum algorithm to be used # to maintain backward compatibility, although keepalived will # attempt to maintain compatibility anyway if it sees an old # version checksum. Specifying never will turn off auto detection # of old checksums. [This option may not be enabled - check output # of `keepalived -v` for OLD_CHKSUM_COMPAT.] old_unicast_checksum [never] # interface specific settings, same as global parameters. # default to global parameters garp_master_delay 10 garp_master_repeat 1 garp_lower_prio_delay 10 garp_lower_prio_repeat 1 garp_master_refresh 60 garp_master_refresh_repeat 2 garp_extra_if [all] 100 # specifying 0 disables feature # The VRRP RFCs state that the master down timer is 3 advert intervals plus # a skew time. Setting down_timer_adverts means the master down timer will be # down_timer_adverts advert intervals. # The default is 3, to conform with the VRRP RFCs. Setting this to any other # value is a deviation from the VRRP protocol. All virtual routers for a given # VRRP instance MUST use the same value. down_timer_adverts [1-100] # If a lower priority advert is received, don't send another advert. # This causes adherence to the RFCs (defaults to global # vrrp_lower_priority_dont_send_advert). lower_prio_no_advert [<BOOL>] # If we are master and receive a higher priority advert, send an advert # (which will be lower priority than the other master), before we transition # to backup. This means that if the other master has garp_lower_prio_repeat # set, it will resend garp messages. This is to get around the problem of # their having been two simultaneous masters, and the last GARP # messages seen were from us. higher_prio_send_advert [<BOOL>] # arbitrary unique number from 1 to 255 # used to differentiate multiple instances of vrrpd # running on the same network interface and address # family and multicast/unicast (and hence same socket). # Note: using the same virtual_router_id with the same # address family on different interfaces has been known # to cause problems with some network switches; if you # are experiencing problems with using the same # virtual_router_id on different interfaces, but the problems # are resolved by not duplicating virtual_router_ids, your # network switches are probably not functioning correctly. # # Whilst in general it is important not to duplicate a # virtual_router_id on the same network interface, there is a # special case when using unicasting if the unicast peers for # the vrrp instances with duplicated virtual_router_ids on the # network interface do not overlap, in which case virtual_router_ids # can be duplicated. # It is also possible to duplicate virtual_router_ids on an # interface with multicasting if different multicast addresses # are used (see mcast_dst_ip). virtual_router_id 51 # for electing MASTER, highest priority wins. # The valid range of values for priority is [1-255], with priority # 255 meaning "address owner". # To be MASTER, it is recommended to make this 50 more than on # other machines. All systems should have different priorities # in order to make behaviour deterministic. If you want to stop # a higher priority instance taking over as master when it starts, # configure no_preempt rather than using equal priorities. # If no_accept is configured (or vrrp_strict # which also sets # no_accept mode), then unless the vrrp_instance has priority 255, # the system will not receive packets addressed to the # VIPs/eVIPs, # and the VIPs/eVIPs can only be used for routeing purposes. # Further, if an instance has priority 255 configured, the priority cannot # be reduced by track_scripts, track_process etc, and likewise # track_scripts etc cannot increase the priority to 255 if the configured # priority is not 255. priority 100 # VRRP Advert interval in seconds (e.g. 0.92) (use default) advert_int 1 # Note: authentication was removed from the VRRPv2 specification by # RFC3768 in 2004. # Use of this option is non-compliant and can cause problems; avoid # using if possible, except when using unicast, where it can be helpful. authentication { # PASS|AH # PASS - Simple password (suggested) # AH - IPSEC (not recommended)) auth_type PASS # Password for accessing vrrpd. # should be the same on all machines. # Only the first eight (8) characters are used. auth_pass 1234 } # addresses add|del on change to MASTER, to BACKUP. # With the same entries on other machines, # the opposite transition will be occurring. # For virtual_ipaddress, virtual_ipaddress_excluded, # virtual_routes and virtual_rules most of the options # match the options of the command ip address/route/rule add. # The track_group option only applies to static addresses/routes/rules. # no_track is specific to keepalived and means that the # vrrp_instance will not transition out of master state # if the address/route/rule is deleted and the address/route/rule # will not be reinstated until the vrrp instance next transitions # to master. # <LABEL>: is optional and creates a name for the alias. For compatibility with "ifconfig", it should be of the form <realdev>:<anytext>, for example eth0:1 for an alias on eth0. # <SCOPE>: ("site"|"link"|"host"|"nowhere"|"global") # preferred_lft is set to 0 to deprecate IPv6 addresses (this is the # default if the address mask is /128). Use "preferred_lft forever" # to specify that a /128 address should not be deprecated. # NOTE: care needs to be taken if dev is specified for an address and # your network uses MAC learning switches. The VRRP protocol ensures # that the source MAC address of the interface sending adverts is # maintained in the MAC cache of switches; however by default this # will not work for the MACs of any VIPs/eVIPs that are configured on # different interfaces from the interface on which the VRRP instance is # configured, since the interface, especially if it is a VMAC interface, # will only send using the MAC address of the interface in response to # ARP requests. This may mean that the interface MAC addresses may # time out in the MAC caches of switches. In order to avoid this, use # the garp_extra_if or garp_extra_if_vmac options to send periodic # GARP/ND messages on those interfaces. virtual_ipaddress { <IPADDR>[/<MASK>] [brd <IPADDR>] [dev <STRING>] [use_vmac] [scope <SCOPE>] [label <LABEL>] [peer <IPADDR>] [home] [-nodad] [mngtmpaddr] [noprefixroute] [autojoin] [no_track] [preferred_lft nn|forever] 192.168.200.17/24 dev eth1 192.168.200.18/24 dev eth2 label eth2:1 } # VRRP IP excluded from VRRP optional. # For cases with large numbers (eg 200) of IPs # on the same interface. To decrease the number # of addresses sent in adverts, you can exclude # most IPs from adverts. # The IPs are add|del as for virtual_ipaddress. # Can also be used if you want to be able to add # a mixture of IPv4 and IPv6 addresses, since all # addresses in virtual_ipaddress must be of the # same family. virtual_ipaddress_excluded { <IPADDR>[/<MASK>] [brd <IPADDR>] [dev <STRING>] [scope <SCOPE>] [label <LABEL>] [peer <IPADDR>] [home] [-nodad] [mngtmpaddr] [noprefixroute] [autojoin] [no_track] <IPADDR>[/<MASK>] ... ... } # Set the promote_secondaries flag on the interface to stop other # addresses in the same CIDR being removed when 1 of them is removed # For example if 10.1.1.2/24 and 10.1.1.3/24 are both configured on an # interface, and one is removed, unless promote_secondaries is set on # the interface the other address will also be removed. promote_secondaries # routes add|del when changing to MASTER, to BACKUP. # See static_routes for more details virtual_routes { # src <IPADDR> [to] <IPADDR>/<MASK> via|gw <IPADDR> # [or <IPADDR>] dev <STRING> scope <SCOPE> table <TABLE> src 192.168.100.1 to 192.168.109.0/24 via 192.168.200.254 dev eth1 192.168.110.0/24 via 192.168.200.254 dev eth1 192.168.111.0/24 dev eth2 no_track 192.168.112.0/24 via 192.168.100.254 192.168.113.0/24 via 192.168.200.254 or 192.168.100.254 dev eth1 blackhole 192.168.114.0/24 0.0.0.0/0 gw 192.168.0.1 table 100 # To set a default gateway into table 100. } # rules add|del when changing to MASTER, to BACKUP # See static_rules for more details virtual_rules { from 192.168.2.0/24 table 1 to 192.168.2.0/24 table 1 no_track } # VRRPv3 has an Accept Mode to allow the virtual router when not the # address owner to receive packets addressed to a VIP. This is the default # setting unless strict mode is set. As an extension, this also works for # VRRPv2 (RFC 3768 doesn't define an accept mode). # -- # Accept packets to non address-owner accept # Drop packets to non address-owner. no_accept # A higher priority VRRP instance will normally preempt a lower priority instance # when it comes online. "nopreempt" stops the higher priority machine taking # over the master role, and allows the lower priority machine to remain as # master. # NOTE: For this to work, the initial state must not be MASTER. # -- nopreempt # for backwards compatibility preempt # Seconds of delay until preemption after getting the advertisement timeout # at startup or when seeing a lower priority master. # # Since it is a delay, it cannot speed up taking over as master. # "preempt_delay" specifies the time in seconds to delay preempting compared # to if "preempt_delay" is not specified. Advertisement timeout is # 3 * advert_int + skew_time. Skew_time is defined by RFC3768 and RFC5798. # # So if "advert_int" is 1, and priority is 128, the instance would normally # wait 3.5 seconds before taking over as master. If "preempt_delay 2" is # specified, then the delay before taking over as master would be approximately # 5.5 seconds. # # (if not disabled by "nopreempt"). # Range: 0 (default) to 1000 (e.g. 4.12) # NOTE: For this to work, the initial state must not be MASTER. preempt_delay 300 # waits 5 minutes # See description of global vrrp_skip_check_adv_addr, which # sets the default value. Defaults to vrrp_skip_check_adv_addr skip_check_adv_addr [on|off|true|false|yes|no] # See description of global vrrp_strict # If strict_mode is not specified, it takes the value of vrrp_strict. # If strict_mode without a parameter is specified, it defaults to on. strict_mode [on|off|true|false|yes|no] # Debug level, not implemented yet. # LEVEL is a number in the range 0 to 4 debug <LEVEL> # notify scripts, alert as above notify_master <STRING>|<QUOTED-STRING> [username [groupname]] notify_backup <STRING>|<QUOTED-STRING> [username [groupname]] notify_fault <STRING>|<QUOTED-STRING> [username [groupname]] # executed when stopping vrrp notify_stop <STRING>|<QUOTED-STRING> [username [groupname]] notify <STRING>|<QUOTED-STRING> [username [groupname]] # The notify_master_rx_lower_pri script is executed if a master # receives an advert with priority lower than the master's priority. notify_master_rx_lower_pri <STRING>|<QUOTED-STRING> [username [groupname]] # Send vrrp instance priority notifications on notify FIFOs. notify_priority_changes <BOOL> # Send SMTP alerts smtp_alert <BOOL> # Set socket receive buffer size (see global_defs # vrrp_rx_bufs_policy for explanation) kernel_rx_buf_size # Set use of linkbeat for the interface of this VRRP instance. This option is # deprecated - use linkbeat_interfaces block instead. linkbeat_use_polling } Interface up/down status change debouncing If an interface that is being used (or tracked) by a VRRP instance goes to down state, the VRRP instance(s) will, by default, immediately transition to FAULT state, and when all relevant interfaces are back up again the VRRP instance(s) will immediately transition to BACKUP state. This can cause problems if interfaces are bouncing, and so delays can be specified between the interface state change and the transition to FAULT/BACKUP state. If the interface returns to its original state before the delay expires, no associated VRRP instance state transition will occur. interface_up_down_delays { ifname down_delay [up_delay] ifname2 down_delay [up_delay] ... } The delays are specified in seconds, with a resolution of microseconds, e.g. a delay of 0.00001 means 10 usecs. A delay of 0 means no delay in state change. The maximum delay that can be specified is 255 seconds. If up_delay is omitted, it is set to be the same as the down delay. The delay on an interface must be less than two (or more precisely one less than down_timer_adverts (default 3)) times the advert interval of any VRRP instance using that interface (otherwise a backup instance, while not receiving adverts may time out and become master before this instance transitions to FAULT state). Consequently the up/down delays can be dynamically reduced if another instance is master with a shorter advert interval. If the VRRP instance is using a VMAC, it will inherit the up/down debounce delays of its parent interface. LVS CONFIGURATION contains subblocks of Virtual server group(s) and Virtual server(s) The subblocks contain arguments for configuring Linux IPVS (LVS) fea- ture. Knowledge of ipvsadm(8) will be helpful here. Configuring LVS is achieved by defining virtual server groups, virtual servers and option- ally SSL configuration. Every virtual server defines a set of real servers, you can attach healthcheckers to each real server. Keepalived will then lead LVS operation by dynamically maintaining topology. For details of what configuration combinations are valid, see the ipvsadm(8) man page. Note: Where an option can be configured for a virtual server, real server, and possibly checker, the virtual server setting is the default for real servers, and the real server setting is the de- fault for checkers. Note: Tunnelled real/sorry servers can differ from the address family of the virtual server and non tunnelled real/sorry servers, which all have to be the same. If a virtual server uses a fw- mark, and all the real/sorry servers are tunnelled, the address family of the virtual server will be the same as the address family of the real/sorry servers if they are all the same, otherwise it will default to IPv4 (use ip_family inet6 to override this). Note: The port for the virtual server can only be omitted if the virtual service is persistent. Virtual server group(s) This feature offers a way to simplify your configuration by factorizing virtual server definitions. If you need to define a bunch of virtual servers with exactly the same real server topology then this feature will make your configuration much more readable, optimize the duplica- tion of IPVS virtual servers if nftables_ipvs is used, and will optimize healthchecking task by only spawning one healthchecker where multiple virtual servers declaration would spawn a dedicated healthchecker for every real server which will waste system resources. Any combination of IP addresses, IP address ranges and firewall marks can be used, provided that the family of the IP addresses of the vir- tual server group match the IP address family of all the real servers of any virtual server using the virtual server group. The one exception to this is that the virtual server group can be configured with both IPv4 and IPv6 addresses and fwmarks provided that all the real servers (and sorry servers) of all virtual servers using the virtual server group use tunnel forwarding; if fwmarks are specified in this case, the address family must be specified (the one exception to this is if the virtual server group has no IP addresses (i.e. fwmarks only) and all the real/sorry servers are tunnelled, it will default to IPv4; it is not good practice to rely on this and the address families of the fw- marks should be configured). Use of this option is intended for very large LVSs, but note, this can create a huge number of virtual servers unless nftables_ipvs is used. The use of nfta- bles_ipvs is stringly recommended due to the very significant optimisations and efficiencies it provides. NOTE: do not configure more than one TCP, one UDP and one SCTP virtual server with the same IP address family using the same virtual server group (or to put it another way do not have two virtual servers with the same protocol and address family using the same virtual server group); if all the real servers are tunnelled, then you must not have both IPv4 and IPv6 virtual servers with the same protocol. The syntax for virtual_server_group is : virtual_server_group <STRING> { # Virtual IP Address and Port <IPADDR> [<PORT>] <IPADDR> [<PORT>] ... # <IPADDR RANGE> is any of the following forms (or their IPv6 equivalents) # XXX.YYY.ZZZ.WWW-VVV eg 192.168.200.1-10 (includes both .1 and .10) # AAA.BBB.CCC.DDD-EEE.FFF.GGG.HHH eg 192.168.200.250-192.168.201.10 # III.JJJ.KKK.LLL/nn eg 192.168.202.8/29 <IPADDR RANGE> [<PORT>] # VIP range [VPORT] <IPADDR RANGE> [<PORT>] ... # Firewall Mark (fwmark) # inet/inet6 should only be specified for virtual server groups # where all real servers of the virtual servers are tunnelled. fwmark <INTEGER> fwmark <INTEGER> [inet|inet6] ... } Virtual server(s) A virtual_server can be a declaration of one of <IPADDR> [<PORT>] , fwmark <INTEGER> or group <STRING> The syntax for virtual_server is : virtual_server <IPADDR> [<PORT>] | virtual_server fwmark <INTEGER> | virtual_server group <STRING> { # LVS scheduler lvs_sched rr|wrr|lc|wlc|lblc|sh|mh|dh|fo|ovf|lblcr|sed|nq|twos # Enable flag-1 for scheduler (-b flag-1 in ipvsadm) flag-1 # Enable flag-2 for scheduler (-b flag-2 in ipvsadm) flag-2 # Enable flag-3 for scheduler (-b flag-3 in ipvsadm) flag-3 # Enable sh-port for sh scheduler (-b sh-port in ipvsadm) sh-port # Enable sh-fallback for sh scheduler (-b sh-fallback in ipvsadm) sh-fallback # Enable mh-port for mh scheduler (-b mh-port in ipvsadm) mh-port # Enable mh-fallback for mh scheduler (-b mh-fallback in ipvsadm) mh-fallback # Enable One-Packet-Scheduling for UDP (-o in ipvsadm) ops # Override default LVS forwarding method (default is NAT). # Default tunnel type is ipip. Since Linux 5.2 the GUE tunnel type can # be specified. If using GUE, a port number is required. Since Linux 5.3 # if the tunnel type is GUE, the checksum option can also be specified. # Since Linux 5.3, GRE tunnel type is also supported, but without the # remcsum option. lvs_method NAT|DR or lvs_method TUN [type {ipip|gue port NUM|gre} [nocsum|csum|remcsum]] # LVS persistence engine name (currently only sip supported) persistence_engine <STRING> # LVS persistence timeout in seconds, default 6 minutes persistence_timeout [<INTEGER>] # LVS granularity mask (-M in ipvsadm) persistence_granularity <NETMASK> # L4 protocol protocol TCP|UDP|SCTP # If VS IP address is not set, # suspend healthchecker's activity ha_suspend # Send email notification during quorum up/down transition, # using addresses in global_defs above (default no, # unless global smtp_alert/smtp_alert_checker set) smtp_alert <BOOL> # Default VirtualHost string for HTTP_GET or SSL_GET # eg virtualhost www.firewall.loc # Overridden by virtualhost config of real server or checker virtualhost <STRING> # snmp_name is a text string that is returned as part of the snmp # data for this virtual server. It can be used to help identify the # virtual server when parsing SNMP output. snmp_name <STRING> # On daemon startup assume that all RSs are down # and healthchecks failed. This helps to prevent # false positives on startup. Alpha mode is # disabled by default. alpha # On daemon shutdown consider quorum and RS # down notifiers for execution, where appropriate. # Omega mode is disabled by default. omega # Minimum total weight of all live servers in # the pool necessary to operate VS with no # quality regression. Defaults to 1. quorum <INTEGER> # Tolerate this much weight units compared to the # nominal quorum, when considering quorum gain # or loss. A flap dampener. Defaults to 0. hysteresis <INTEGER> # Script to execute when quorum is gained. quorum_up <STRING>|<QUOTED-STRING> [username [groupname]] # Script to execute when quorum is lost. quorum_down <STRING>|<QUOTED-STRING> [username [groupname]] # IP family for a fwmark service (only needed if all real servers are tunnelled # and persistence_granularity is not specified). Defaults to inet if not specified. ip_family inet|inet6 # setup realserver(s) # RS to add to LVS topology when the quorum isn't achieved. # If a sorry server is configured, all real servers will # be brought down when the quorum is not achieved and be # replaced with the sorry server. sorry_server <IPADDR> [<PORT>] # applies inhibit_on_failure behaviour to the sorry_server sorry_server_inhibit # Sorry server LVS forwarding method. Default is the virtual # server's default. # For details of tunnel type, see virtual_server details. sorry_server_lvs_method NAT|DR or sorry_server_lvs_method TUN [type {ipip|gue port NUM|gre} [nocsum|csum|remcsum]] # Optional connection timeout in seconds. # The default is 5 seconds connect_timeout <TIMER> # Retry count to make additional checks if check # of an alive server fails. Default: 1 unless specified below retry <INTEGER> # delay before retry after failure. Defaults to delay_loop for DNS_CHECK, # 3 seconds for HTTP_GET and SSL_GET, and 1 second otherwise. delay_before_retry <TIMER> # Optional random delay to start the initial check # for maximum N seconds. # Useful to scatter multiple simultaneous # checks to the same RS. Enabled by default, with # the maximum at delay_loop. Specify 0 to disable warmup <TIMER> # delay timer for checker polling (60 seconds if not specified) delay_loop <TIMER> # Set weight to 0 when healthchecker detects failure inhibit_on_failure # one entry for each realserver real_server <IPADDR> [<PORT>] { # relative weight to use, default: 1 weight <INTEGER> # LVS forwarding method # For details of tunnel type, see virtual_server details. The default # setting is taken from the virtual_server's setting. lvs_method NAT|DR or lvs_method TUN [type {ipip|gue port NUM|gre} [nocsum|csum|remcsum]] # Script to execute when healthchecker # considers service as up. notify_up <STRING>|<QUOTED-STRING> [username [groupname]] # Script to execute when healthchecker # considers service as down. notify_down <STRING>|<QUOTED-STRING> [username [groupname]] # maximum number of connections to server uthreshold <INTEGER> # minimum number of connections to server lthreshold <INTEGER> # Send email notification during state transition, # using addresses in global_defs above (default yes, # unless global smtp_alert/smtp_alert_checker set) smtp_alert <BOOL> # Default VirtualHost string for HTTP_GET or SSL_GET # eg virtualhost www.firewall.loc # Overridden by virtualhost config of a checker virtualhost <STRING> # snmp_name is a text string that is returned as part of the snmp # data for this real server. It can be used to help identify the # real server when parsing SNMP output. snmp_name <STRING> alpha <BOOL> # see above connect_timeout <TIMER> # see above retry <INTEGER> # see above delay_before_retry <TIMER> # see above warmup <TIMER> # see above delay_loop <TIMER> # see above inhibit_on_failure <BOOL> # see above # healthcheckers. Can be multiple of each type # HTTP_GET|SSL_GET|TCP_CHECK|SMTP_CHECK|DNS_CHECK|MISC_CHECK|BFD_CHECK|UDP_CHECK|PING_CHECK|FILE_CHECK # All checkers have the following options, except MISC_CHECK which only # has options alpha onwards, and BFD_CHECK and FILE_CHECK which have none # of the standard options: CHECKER_TYPE { # ======== generic connection options # Optional IP address to connect to. # The default is the realserver IP connect_ip <IPADDR> # Optional port to connect to # The default is the realserver port connect_port <PORT> # Optional address to use to # originate the connection bindto <IPADDR> # Optional interface to use; needed if # the bindto address is IPv6 link local bind_if <IFNAME> # Optional source port to # originate the connection from bind_port <PORT> # Optional fwmark to mark all outgoing # checker packets with fwmark <INTEGER> alpha <BOOL> # see above connect_timeout <TIMER> # see above retry <INTEGER> # see above delay_before_retry <TIMER> # see above warmup <TIMER> # see above delay_loop <TIMER> # see above log_all_failures <BOOL> # log all failures when checker up } # The following options are additional checker specific # HTTP and SSL healthcheckers HTTP_GET|SSL_GET { # HTTP protocol version, one of 1.0, 1.0C, 1.1 # Protocol version 1.0C means version 1.0 with the addition # of a "Connection: close" line, which is included in # version 1.1 by default. http_protocol <PROTOCOL> # When alpha mode is set, or when recovering from a failure, # each URL is checked, with a delay of <delay_loop> between # each check. if there were 20 URLs, and the <delay_loop> were # 3 seconds, it would take 1 minute before the RS would come up # following startup, or recovery from a failure. Setting # fast_recovery removes the delay, both at start up and after # recovery from a failure, meaning that the RS will come up # once all the URLs have been checked, with no delay between # checking each URL. fast_recovery [<BOOL>] # An url to test # can have multiple entries here url { #eg path / , or path /mrtg2/ path <STRING> # healthcheck needs digest # or status_code and digest # Digest computed with genhash # eg digest 9b3a0c85a887a256d6939da88aabd8cd digest <STRING> # status code returned in the HTTP header # eg status_code 200 or status_code 200-299 400-499 503 505 # Default is 200-299 status_code <INTEGER|RANGE> [<INTEGER|RANGE>] ... # VirtualHost string. eg virtualhost www.firewall.loc # If not set, uses virtualhost from real or virtual server virtualhost <STRING> # Regular expression to search returned data against. # A failure to match causes the check to fail. regex <STRING> # Reverse the sense of the match, so a match of the # returned text causes the check to fail. regex_no_match # Space separated list of options for regex. # See man pcre2api for a description of the options. # The following option are supported: # allow_empty_class alt_bsux auto_callout caseless # dollar_endonly dotall dupnames extended firstline # match_unset_backref multiline never_ucp never_utf # no_auto_capture no_auto_possess no_dotstar_anchor # no_start_optimize ucp ungreedy utf never_backslash_c # alt_circumflex alt_verbnames use_offset_limit regex_options <OPTIONS> # For complicated regular expressions a larger stack # may be needed, and this allows the start and maximum # sizes in bytes to be specified. For more details see # the documentation for pcre2_jit_stack_create() regex_stack <START> <MAX> # The minimum offset into the returned data to start # checking for the regex pattern match. This can save # processing time if the returned data is large. regex_min_offset <OFFSET> # The maximum offset into the returned data for the # start of the subject match. regex_max_offset <OFFSET> } } SSL_GET { # when provided, send Server Name Indicator during SSL handshake enable_sni } # TCP healthchecker TCP_CHECK { # No additional options } # SMTP healthchecker SMTP_CHECK { # Optional string to use for the SMTP HELO request helo_name <STRING>|<QUOTED-STRING> } # DNS healthchecker. Uses UDP protocol. DNS_CHECK { # The retry default is 3. # DNS query type # A|NS|CNAME|SOA|MX|TXT|AAAA # The default is SOA type <STRING> # Domain name to use for the DNS query # The default is . (dot) name <STRING> } # MISC healthchecker, run a program MISC_CHECK { # The retry default is 0. # External script or program misc_path <STRING>|<QUOTED-STRING> # Script execution timeout misc_timeout <INTEGER> # If misc_dynamic is set, the exit code from healthchecker # is used to dynamically adjust the weight as follows: # exit status 0: svc check success, weight # unchanged. # exit status 1: svc check failed. # exit status 2-255: svc check success, # then the RS weight is increased by # (exit status - 2 - rs configured weight). # An exit status of 10 will set the RS weight to 10. If # the exit status subsequently changes to 20, the RS # weight will become 20. # If there is only one MISC_CHECK and no FILE_CHECKers # the effect is to set the RS weight to two less than # the exit status. # (for example: an exit status of 255 would set # weight to 253 if no other MISC_CHECKers or # FILE_CHECKers where configured on the RS) misc_dynamic # Specify the username/groupname that the script should # be run under. # If GROUPNAME is not specified, the group of the user # is used user USERNAME [GROUPNAME] } # BFD instance name to check BFD_CHECK { name <STRING> } # PING healthchecker # Note: using this checker may cause /proc/sys/net/ipv4/ping_group_range to be # updated to allow root to use an IPPROTO_ICMP socket. PING_CHECK { # No additional options } # UDP healthchecker # Note: for this checker to work properly, it relies on ICMP error messages such as # HOST_UNREACH, NET_UNREACH, PORT_UNREACH. HOST_UNREACH relies on ARP requests # timing out, and so connect_timeout should be long enough to allow for this (e.g. # at least 4 seconds). # If payload is specified, the HEX_STR will be sent as the UDP data, otherwise a # random payload will be sent. # If require_reply is specified, the received data length is checked to ensure that it # lies between min_reply_length and max_reply_length. # If require_reply without a hex string is specified, udp reply data must be received # but the data content is not checked. # If a require_reply HEX_STR is specified, the reply data will be checked against the # HEX_STR, which must match up to the minimum of the received data length and the length # of the require_reply HEX_STR. # The format of HEX_STR is quite free format, for example: # Ab12f 3 456 546443123 # would be interpreted as: # AB 12 0F 03 45 06 54 64 43 12 03 # For the require_reply HEX_STR, a character can be specified as X or x, in which case # the value of those 4 bits in the reply is ignored. This allows, for example, for # some form of counter or otherwise. # It may be that you will want to use PING_CHECK to the same server as well. UDP_CHECK { payload <HEX_STR> require_reply [<HEX_STR>] # Require a reply packet for check to be successful min_reply_length <INT> # default 0 max_reply_length <INT> # default is 255 } # File checker # This reads and monitors the contents of a file, where STRING is the name specified # in the track_file configuration block (see above). FILE_CHECK { track_file <STRING> # If dynamic is set, the value from the file is used # to dynamically adjust the weight by adding the weight # to the quorum and the LVS weight dynamic # The weight multiplier to apply to the value read from the file weight <-2147483647..2147483647> [reverse] } } } # Parameters used for SSL_GET check. # If none of the parameters are specified, the SSL context # will be auto generated. SSL { # Password password <STRING> # CA file ca <STRING> # Certificate file certificate <STRING> # Key file key <STRING> } ADVANCED CONFIGURATION Configuration parser has been extended to support advanced features such as conditional configuration and parameter substitution. These features are very useful for any scripted environment where configura- tion templates are generated (datacenters). Conditional configuration and configuration id The config-id defaults to the first part of the node name as returned by uname, and can be overridden with the -i or --config-id command line option. Any configuration line starting with '@' is a conditional configuration line. The word immediately following (i.e. without any space) the '@' character is compared against the config-id, and if they don't match, the configuration line is ignored. Alternatively, '@^' is a negative comparison, so if the word immedi- ately following does NOT match the config-id, the configuration line IS included. The purpose of this is to allow a single configuration file to be used for multiple systems, where the only differences are likely to be the router_id, vrrp instance priorities, and possibly interface names and unicast addresses. For example: global_defs { @main router_id main_router @backup router_id backup_router } ... vrrp_instance VRRP { ... @main unicast_src_ip 1.2.3.4 @backup unicast_src_ip 1.2.3.5 @backup2 unicast_src_ip 1.2.3.6 unicast_peer { @^main 1.2.3.4 @^backup 1.2.3.5 @^backup2 1.2.3.6 } ... } If keepalived is invoked with -i main, then the router_id will be set to main_router, if invoked with -i backup, then backup_router, if not invoked with -i, or with -i anything else, then the router_id will not be set. The unicast peers for main will be 1.2.3.5 and 1.2.3.6. Parameter substitution Substitutable parameters can be specified. The format for defining a parameter is: $PARAMETER=VALUE where there must be no space before the '=' and only whitespace may preceed to '$'. Empty values are allowed. Parameter names can be made up of any combination of A-Za-z0-9 and _, but cannot start with a digit. Parameter names starting with an under- score should be considered reserved names that keepalived will define for various pre-defined options. After a parameter is defined, any occurrence of $PARAMETER followed by whitespace, or any occurrence of ${PARAMETER} (which need not be fol- lowed by whitespace) will be replaced by VALUE. Replacement is recursive, so that if a parameter value itself includes a replaceable parameter, then after the first substitution, the parame- ter in the value will then be replaced; the substitution is done at re- placement time and not at definition time, so for example: $ADDRESS_BASE=10.2.${ADDRESS_BASE_SUB} $ADDRESS_BASE_SUB=0 ${ADDRESS_BASE}.100/32 $ADDRESS_BASE_SUB=10 ${ADDRESS_BASE}.100/32 will produce: 10.2.0.100/32 10.2.10.100/32 Note in the above examples the use of both ADDRESS_BASE and AD- DRESS_BASE_SUB required braces ({}) since the parameters were not fol- lowed by whitespace (after the first substitution which produced 10.2.${ADDRESS_BASE_SUB}.100/32 the parameter is still not followed by whitespace). If a parameter is not defined, it will not be replaced at all, so for example ${UNDEF_PARAMETER} will remain in the configuration if it is undefined; this means that existing configuration that contains a '$' character (for example in a script definition) will not be changed so long as no new parameter definitions are added to the configuration. Parameter substitution works in conjunction with conditional configura- tion. For example: @main $PRIORITY=240 @backup $PRIORITY=200 ... vrrp_instance VI_0 { priority $PRIORITY } will produce: ... vrrp_instance VI_0 { priority 240 } if the config_id is main. $IF_MAIN=@main $IF_MAIN priority 240 will produce: priority 240 if the config_id is main and nothing if the config_id is not main, although why anyone would want to use this rather than simply the following is not known (but still possible): @main priority 240 Multiline definitions are also supported, but when used there must be nothing on the line after the parameter name. A multiline definition is specified by ending each line except the last with a '\' character. Example: $INSTANCE= \ vrrp_instance VI_${NUM} { \ interface eth0.${NUM} \ use_vmac vrrp${NUM}.1 \ virtual_router_id 1 \ @high priority 130 \ @low priority 120 \ advert_int 1 \ virtual_ipaddress { \ 10.0.${NUM}.254/24 \ } \ track_script { \ offset_instance_${NUM} \ } \ } $NUM=0 $INSTANCE $NUM=1 $INSTANCE The use of multiline definitions can be nested. Example: $RS= \ real_server 192.168.${VS_NUM}.${RS_NUM} 80 { \ weight 1 \ inhibit_on_failure \ smtp_alert \ MISC_CHECK { \ misc_path "${_PWD}/scripts/vs.sh RS_misc.${INST}.${VS_NUM}.${RS_NUM}.0 10.0.${VS_NUM}.4:80->192.168.${VS_NUM}.${RS_NUM}:80" \ } \ MISC_CHECK { \ misc_path "${_PWD}/scripts/vs.sh RS_misc.${INST}.${VS_NUM}.${RS_NUM}.1 10.0.${VS_NUM}.4:80->192.168.${VS_NUM}.${RS_NUM}:80" \ } \ notify_up "${_PWD}/scripts/notify.sh RS_notify.${INST}.${VS_NUM}.${RS_NUM} UP 10.0.${VS_NUM}.4:80->192.168.${VS_NUM}.${RS_NUM}:80" \ notify_down "${_PWD}/scripts/notify.sh RS_notify.${INST}.${VS_NUM}.${RS_NUM} DOWN 10.0.${VS_NUM}.4:80->192.168.${VS_NUM}.${RS_NUM}:80" \ } $VS= \ virtual_server 10.0.${VS_NUM}.4 80 { \ quorum 2 \ quorum_up "${_PWD}/scripts/notify.sh VS_notify.${INST} UP 10.0.${VS_NUM}.4:80" \ quorum_down "${_PWD}/scripts/notify.sh VS_notify.${INST} DOWN 10.0.${VS_NUM}.4:80" \ $RS_NUM=1 \ $RS \ $RS_NUM=2 \ $RS \ $RS_NUM=3 \ $RS \ } $VS_NUM=0 $ALPHA=alpha $VS $VS_NUM=1 $ALPHA= $VS The above will create 2 virtual servers, each with 3 real servers Pre-defined definitions The following definitions are pre-defined: ${_PWD} : The directory of the current configuration file (this can be changed if using the include directive). ${_INSTANCE} : The instance name (as defined by the -i option, defaults to hostname). ${_RANDOM [MIN [MAX]]} : This is replaced by a random integer in the range [MIN, MAX], where MIN and MAX are optional non- negative integers. Defaults are MIN=0 and MAX=32767. ${_HASH} : This is replaced by a '#' character, which would otherwise start a comment ${_BANG} : This is replaced by a '!' character, which would otherwise start a comment Additional pre-defined definitions will be added as their need is iden- tified. It will normally be quite straightforward to add additional pre-defined definitions, so if you need one, or have a good idea for one, then raise an issue at https://github.com/acassen/keepalived/is- sues requesting it. Sequence blocks A line starting ~SEQ(var, start, step, end) will cause the remainder of the line to be processed multiple times, with the variable $var set initially to start, and then $var will be incremented by step repeatedly, terminating when it is greater than end. step may be omitted, in which case it defaults to 1 or -1, depending on whether end is greater or less than start. start may also be omitted, in which case it defaults to 1 if end > 0 or -1 if end < 0. ~SEQx(...) is the same as ~SEQ(...), except the variable $var will for formatted in hexadecimal, which would be useful for IPv6 addresses. Note: At the moment it is necessary to use different variables for the ~SEQ block from any previously defined variable, including one used as the variable in a previous ~SEQ block. This may change in the future, so do not rely on a ~SEQ block variable being defined after the end of the block. Examples: ~SEQ(SUBNET, 0, 3) ip_address 10.0.${SUBNET}.1 would produce: ip_address 10.0.0.1 ip_address 10.0.1.1 ip_address 10.0.2.1 ip_address 10.0.3.1 and ~SEQx(SUBNET, 144, 16, 192) ip_address fe80::20:${SUBNET}:1 or better ~SEQx(SUBNET, 0x90, 0x10, 0xc0) ip_address fe80::20:${SUBNET}:1 would produce: ip_address fe80::20:90:1 ip_address fe80::20:a0:1 ip_address fe80::20:b0:1 ip_address fe80::20:c0:1 Another example: virtual_ipaddress { ~SEQx(AD2, 0x90, 0x10, 0xc0) ~SEQx(AD1, 0x12, -1, 0x0c) fe81::10:${AD2}:${AD1} } There can be multiple ~SEQ elements on a line, so for example: $VI4= \ track_file offset_instance_4.${IF}.${NUM}.${ID} { \ file "${_PWD}/679/track_files/4.${IF}.${NUM}.${ID}" \ weight -100 \ } \ vrrp_instance vrrp4.${IF}.${NUM}.${ID} { \ interface bond${IF}.${NUM} \ use_vmac vrrp4.${IF}.${NUM}.${ID} \ virtual_router_id ${ID} \ priority 130 \ virtual_ipaddress { \ 10.${IF}.${NUM}.${ID}/24 \ } \ track_file { \ offset_instance_4.${IF}.${NUM}.${ID} \ } \ } ~SEQ(IF,0,7) ~SEQ(NUM,0,31) ~SEQ(ID,1,254) $VI4 will produce 65024 vrrp instances with names from vrrp4.0.0.1 through to vrrp4.7.31.254. List blocks List blocks are similar to sequence blocks, except that the values to substitute into the variable are listed in the ~LST specification. A line starting ~LST(var, val1, val2, val3) will cause the remainder of the line to be processed multiple times, with the variable $var set initially to val1, and then val2, and finally val3. Any number of values can be specified, as long as at least one value is (although only one value would be pointless). If it is desired to substitute more than one variable at a time, the variables and values need to be enclosed in {...} blocks. For example: ~LST({IP, IP1}, {10,1},{20,4},{5,6},{12,8}) 192.168.${IP}.${IP1} would first set IP=10 and IP1=1, then IP=20 and IP1=4, etc, and pro- duces: 192.168.10.1 192.168.20.4 192.168.5.6 192.168.12.8 List blocks can be nested, so: ~LST(IP, 1, 2, 3, 4) ~LST(IP1, 5,6,7) 192.169.${IP}.${IP1} produces: 192.169.1.5 192.169.1.6 192.169.1.7 192.169.2.5 192.169.2.6 192.169.2.7 192.169.3.5 192.169.3.6 192.169.3.7 192.169.4.5 192.169.4.6 192.169.4.7 Finally, list blocks and sequence blocks can be combined, so: ~LST({IP, IP1}, {10,1},{20,4},{5,6},{12,8}) ~SEQ(IP2,168,2,172) 192.${IP2}.${IP}.${IP1} produces: 192.168.10.1 192.170.10.1 192.172.10.1 192.168.20.4 192.170.20.4 192.172.20.4 192.168.5.6 192.170.5.6 192.172.5.6 192.168.12.8 192.170.12.8 192.172.12.8 KERNEL SETTINGS It has been identified that if proxy_arp and proxy_arp_pvlan are enabled on an interface that has VIPs or eVIPs configured on it, it can cause incorrect replies to ARP requests due to the proxy replying to the ARP request as well as the keepalived host. Both need to be set to 0 to function properly. AUTHORS Initial by Joseph Mack. Extensive updates by Alexandre Cassen & Quentin Armitage. SEE ALSO ipvsadm(8), ip --help. Keepalived 2022-01-15 keepalived.conf(5)
Создадим скрипт /opt/postfix.check.sh, он должен вернуть код ответа 0, если сервис Postfix работает нормально.
#!/bin/bash
systemctl is-active postfix > /dev/null 2>&1
Дадим права на него пользователю postfix:
chown postfix: /opt/postfix.check.sh
chmod 700 /opt/postfix.check.sh
Запуск кластера
На мастере:
systemctl enable keepalived
systemctl start keepalived
systemctl status keepalived
ip a
Мы увидели второй виртуальный IP адрес 10.12.49.137.
На бэкапе:
systemctl enable keepalived
systemctl start keepalived
systemctl status keepalived
ip a
Проверяем на бэкапе работу VRRP:
Ловим VRRP пакеты с помощью tcpdump
Для проверки keepalived остановим на мастере службу postfix.
systemctl stop postfix
И виртуальный IP переехал на вторую ноду.
Ссылки
Установка Postfix на Ubuntu 20.04 LTS в минимальной конфигурации