[Pacemaker] detecting resource failures after maintenance

Jeffrey Lewis jlewis at 42lines.net
Fri May 10 15:53:00 UTC 2013


It seems pacemaker is not properly detecting resource failures after
maintenance.  Example follows.

Pacemaker is managing two IPaddr2 resources.  Both resources are
online, and all is well.

jlewis at qa3db22:~$ sudo crm resource show
 shard0_ip (ocf::heartbeat:IPaddr2) Started
 shard1_ip (ocf::heartbeat:IPaddr2) Started

I decide to do some maintenance and set is-managed-default=false.
This way, pacemaker will continue monitoring all resources, but will
not take action should a resource fail.

jlewis at qa3db22:~$ sudo crm configure property is-managed-default=false

jlewis at qa3db23:~$ sudo crm resource show
 shard0_ip (ocf::heartbeat:IPaddr2) Started  (unmanaged)
 shard1_ip (ocf::heartbeat:IPaddr2) Started  (unmanaged)

I then take resource 'shard1_ip' offline using ifconfig.  Pacemaker
correctly notices that this resource has failed.

jlewis at qa3db23:~$ sudo ifconfig eth0:shard1 down

jlewis at qa3db23:~$ sudo crm resource show
 shard0_ip (ocf::heartbeat:IPaddr2) Started  (unmanaged)
 shard1_ip (ocf::heartbeat:IPaddr2) Started  (unmanaged) FAILED

However, when I set is-managed-default=true, pacemaker incorrectly
think resource 'shard1_ip' is ok, but the IP address is still down.

jlewis at qa3db23:~$ sudo crm configure property is-managed-default=true

jlewis at qa3db23:~$ sudo crm resource show
 shard0_ip (ocf::heartbeat:IPaddr2) Started
 shard1_ip (ocf::heartbeat:IPaddr2) Started

I don't necessarily expect pacemaker to start this IP, since it was
stopped when pacemaker was not managing this resource, but I do expect
pacemaker to correctly report current status.

Any hints?

Thanks,
Jeffrey




More information about the Pacemaker mailing list