[Pacemaker] designing a load balancer - request for comments

Mon Feb 14 09:46:24 EST 2011

Am 14.02.2011 14:45, schrieb Raoul Bhatia [IPAX]:
> On 02/14/2011 02:37 PM, Klaus Darilion wrote:
>> Somehow pacemaker does not react as I would expect it. My config is:
>>
>> primitive failover-ip ocf:heartbeat:IPaddr \
>>         params ip="83.136.32.161" \
>>         op monitor interval="3s"
>> primitive kamailio lsb:kamailio \
>>         meta migration-threshold="2" failure-timeout="60" \
>>         op monitor interval="15" timeout="15"
>> clone cloneKamailio kamailio
>> colocation colo_ip_with_kamailio inf: failover-ip cloneKamailio
>> property $id="cib-bootstrap-options" \
>>         dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
>>         cluster-infrastructure="openais" \
>>         expected-quorum-votes="2" \
>>         stonith-enabled="false" \
>>         no-quorum-policy="ignore"
>> rsc_defaults $id="rsc-options" \
>>         resource-stickiness="5"
> ...
>> So, what am I doing wrong? I would expect that after 60s the
>> failure-count is resetted.
> 
> there is no "cluster-recheck-interval" in your properties:
> 
> property $id="cib-bootstrap-options" \
>         dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
>         stonith-enabled="true" \
>         cluster-infrastructure="openais" \
> ...
>         cluster-recheck-interval="1min"
> 
> try to set this and redo your testing.

Ah, intersting :-)

But still not as expected. On cluster recheck, pacemaker detects the
failure timeout:

notice: get_failcount: Failcount for cloneKamailio on armani has expired
(limit was 60s)
notice: RecurringOp:  Start recurring monitor (15s) for kamailio:0 on armani

So, Kamailio gets restarted after the failure-timeout, but the
failure-count is still not reset.

virtual-IP on server1, Kamailio on server1 and server2
server1 failure count: 0
server2 failure count: 0

then I stop Kamailio on server1 --> pacemaker restarts Kamailio

virtual-IP on server1, Kamailio on server1 and server2
server1 failure count: 1
server2 failure count: 0

then I stop Kamailio on server1 --> pacemaker migrates the IP

virtual-IP on server2, Kamailio on server2
server1 failure count: 2
server2 failure count: 0

After failure-timeout, Kamailio gets restarted:

virtual-IP on server2, Kamailio on server1 and server2
server1 failure count: 2
server2 failure count: 0

Then server2 is set to standby, -> IP is migrated to server1

virtual-IP on server1, Kamailio on server1
server1 failure count: 2
server2 failure count: 0

Then server2 is set online again:

virtual-IP on server1, Kamailio on server1 and server2
server1 failure count: 2
server2 failure count: 0

then I stop Kamailio on server1 --> pacemaker migrates the IP

virtual-IP on server2, Kamailio on server2
server1 failure count: 3
server2 failure count: 0

After failure-timeout I would have expected that everything starts from
the beginning, so failure-count would be set to 0 again and it would
need again 2 failures (threshold) to migrate.

regards
Klaus