[Pacemaker] How to speed up failover on node failure and network outage

Fri Feb 18 09:20:05 EST 2011

Hi,

On Fri, Feb 18, 2011 at 02:45:13PM +0100, Frederik Schüler wrote:
> Hello *,
> 
> I have an interesting problem at a customer installation site:
> 
> 1. The failover on node failure (unplugging the power cords) takes about 20s.
> 2. The failover on network outage (unplugging the network cable of the active 
> node) takes about 40s.
> 
> The setup is as follows:
> 
> heartbeat 3.0.3 from Debian "Squeeze"
> pacemaker 1.0.9.1 from Debian "Squeeze"
> 
> - 2 nodes, two network connections between the nodes for hb
> - a drbd master-slave
> - a group, started on the drbd master, with following components:
>   * a filesystem (on the drbd)
>   * an IP address
>   * a postgresql database
>   * two LSB scripts
> - an ocf:paemaker:ping clone on both nodes to detect network outages
> 
> 
> A failover time of about 2-3s for both node and network failure is required by 
> the customer.
> 
> This is due to the setup before the drbd, postgresql etc was added:
> 
> A heartbeat-2 setup with one group, containing only one IP Address and an LSB 
> script, with single network connection between the nodes, no pingd/ipfail 
> setup. The deadtime was set to 2s, so the cluster would indeed failover within 
> 2-3s on node failure. A network outage would have caused a split-brain 
> situation, and the standby node to go active within 2-3s.
> 
> Now, with drbd in place, abusing the split brain situation this way is beyond 
> question, but the fast failover time is still required.
> 
> 
> Is it possible to substantially speed up the failover times?
> 
> 
> Basically, I am seeking for one of the following possibilities:
> 
> 1. It is possible to get the times down, by tuning the configuration or by 
> using some patches from hg (I noticed a lot of "speedup enhancements" in 
> pacemaker 1.2)

I think that those performance fixes are mainly for scalability.
If your configuration is not big, then it won't make much of a
difference.

> 2. It could be done, but there has to be done some development work - my 
> customer is willing to pay for development work in this issue.
> 
> 3. It is not possible within the current way heartbeat/pacemaker works 
> internally.

You should first identify where the time is being spent on
failover. I think that it is possible to reduce it, but the
question is whether it is safe to do so, in particular in case
of split brain or when node just disappears. False positives are
to be avoided. It certainly won't make your customer happy to
see that the cluster is reducing availability.

Thanks,

Dejan

> Best regards
> Frederik Schüler
> 
> -- 
> five times nine                              keep your business safe.
> Inhaber: Frederik Schueler              Kirschgarten 15 21031 Hamburg
> Tel: 040 219 84 844                             Mobil: 0170 298 28 47
> Web: http://fivetimesnine.de/                    USt ID: DE-254646986

> node $id="80e49a8c-48f9-4b83-98ed-247c3379c637" rollenserver1
> node $id="d074ae53-cf19-4914-b93e-5ea478674856" rollenserver2
> primitive IP ocf:heartbeat:IPaddr2 \
>         params ip="10.212.4.250" nic="eth0" cidr_netmask="24" \
>         op monitor interval="10s" timeout="20s" \
>         op start interval="0" timeout="60s" \
>         op stop interval="0" timeout="60s"
> primitive drbd ocf:linbit:drbd \
>         params drbd_resource="r0" \
>         op monitor interval="120s" timeout="60s" \
>         op start interval="0" timeout="240s" \
>         op stop interval="0" timeout="100s"
> primitive fs ocf:heartbeat:Filesystem \
>         params device="/dev/drbd0" directory="/var/lib/postgresql/8.1/ha" fstype="ext3" options="noatime"
> primitive pgsql ocf:heartbeat:pgsql \
>         params pgctl="/usr/lib/postgresql/8.1/bin/pg_ctl" psql="/usr/lib/postgresql/8.1/bin/psql" pgdata="/var/lib/postgresql/8.1/ha" pgport="5433" pgdb="postgres" start_opt="-c config_file=/etc/postgresql/8.1/ha/postgresql.conf" logfile="/var/log/postgresql/postgresql-8.1-ha.log" \
>         op monitor interval="120s" timeout="60s" \
>         op start interval="0" timeout="120" \
>         op stop interval="0" timeout="120s"
> primitive ping ocf:pacemaker:ping \
>         params host_list="10.212.4.242" dampen="2s" \
>         op monitor interval="3s" timeout="5s" \
>         op start interval="0" timeout="60s" \
>         op stop interval="0" timeout="60s"
> primitive rollenserver lsb:Rollenserver
> primitive smsrelay lsb:SMSRelay
> group Rollenserver IP rollenserver fs pgsql smsrelay
> ms ms-drbd drbd \
>         meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started"
> clone pingclone ping \
>         meta globally-unique="false" target-role="Started"
> location ms-drbd_master_on_connected_node ms-drbd \
>         rule $id="ms-drbd_master_on_connected_node-rule" $role="master" -2000: not_defined pingd or pingd lte 0
> colocation rollenserver_on_drbd inf: Rollenserver ms-drbd:Master
> order rollenserver_after_drbd inf: ms-drbd:promote Rollenserver:start
> property $id="cib-bootstrap-options" \
>         dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
>         cluster-infrastructure="Heartbeat" \
>         expected-quorum-votes="2" \
>         no-quorum-policy="ignore" \
>         stonith-enabled="false" \
>         default-resource-stickiness="100" \
>         last-lrm-refresh="1297176953" \
>         dc-deadtime="60s" \
>         symmetric-cluster="true"

> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker