[ClusterLabs] Pacemaker tries to demote resource that isn't running and returns OCF_FAILED_MASTER

Andrew Beekhof andrew at beekhof.net
Fri Aug 28 04:14:43 UTC 2015


> On 21 Aug 2015, at 1:32 pm, Andrei Borzenkov <arvidjaar at gmail.com> wrote:
> 
> 21.08.2015 00:35, Brian Campbell пишет:
>> I have a master/slave resource (with a custom resource agent) which,
>> if it uncleanly shut down, will return OCF_FAILED_MASTER on the next
>> "monitor" operation. This seems to be what
>> http://www.linux-ha.org/doc/dev-guides/_literal_ocf_failed_master_literal_9.html
>> suggests that exit code should be used for.
>> 
>> After the node is fenced, and comes up again, Pacemaker probes all of
>> the resources. It gets the OCF_FAILED_MASTER exit code, and decides
>> that it needs to demote the resource. So it executes the demote
>> action. My resource agent returns an error on a demote action if it is
>> not running, which seems to be the suggested behavior according to
>> http://www.linux-ha.org/doc/dev-guides/_literal_demote_literal_action.html
>> 
>> This then causes Pacemaker to log a failure for the "demote" action,
>> and then try to recover by stopping (which succeeds cleanly because
>> the resource is stopped) followed by starting it again (which again
>> succeeds, as we can start in slave mode from a failed state). So the
>> end state is correct, but crm_mon shows a failed action that you need
>> to clear out:
>> 
>> Failed actions:
>>     editshare.stack.7c645b0e-46bb-407e-b48a-92ec3121f2d7.lizardfs-master.primitive_demote_0
>> (node=es-efs-master2, call=73, rc=1, status=complete, l
>> ast-rc-change=Thu Aug 20 12:52:21 2015
>> , queued=54ms, exec=1ms
>> ): unknown error
>> 
>> I'm curious about whether the behavior of my resource agent is
>> correct. Should I not be returning OCF_FAILED_MASTER upon the
>> "monitor" operation if the resource isn't started?
> 
> Correct. If resource is not started it cannot be master or slave; it can become master only after pacemaker requested it. Unexpected master would be just the same error as well.
> 
> If you can determine that one resource instance is more suitable to become master than another one, you should set master score respectively so pacemaker will promote correct instance.
> 
>>                                                   Or should the
>> "demote" operation do something different in this state, like actually
>> starting up the slave?
>> 
> 
> In general, if current resource state is the same as would be after operation is completed, there is absolutely no reason to return error - just pretend operation succeeded.

Always return the actual state. ie. OCF_NOT_RUNNING in these two cases.

Only return OCF_FAILED_MASTER if you know enough to say that its in the master state (ie. lock file, or similar mechanism) but not able to handle requests.

> 
>> It seems like the behavior of Pacemaker is different than what's
>> documented in the resource agent guide, so I'm trying to figure out if
>> this is a bug in my resource agent, a bug in Pacemaker, a
>> misunderstanding on my part, or actually intended behavior.
>> 
>> -- Brian
>> 
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Users mailing list