[ClusterLabs] Pacemaker tries to demote resource that isn't running and returns OCF_FAILED_MASTER

Thu Aug 20 23:32:35 EDT 2015

21.08.2015 00:35, Brian Campbell пишет:
> I have a master/slave resource (with a custom resource agent) which,
> if it uncleanly shut down, will return OCF_FAILED_MASTER on the next
> "monitor" operation. This seems to be what
> http://www.linux-ha.org/doc/dev-guides/_literal_ocf_failed_master_literal_9.html
> suggests that exit code should be used for.
>
> After the node is fenced, and comes up again, Pacemaker probes all of
> the resources. It gets the OCF_FAILED_MASTER exit code, and decides
> that it needs to demote the resource. So it executes the demote
> action. My resource agent returns an error on a demote action if it is
> not running, which seems to be the suggested behavior according to
> http://www.linux-ha.org/doc/dev-guides/_literal_demote_literal_action.html
>
> This then causes Pacemaker to log a failure for the "demote" action,
> and then try to recover by stopping (which succeeds cleanly because
> the resource is stopped) followed by starting it again (which again
> succeeds, as we can start in slave mode from a failed state). So the
> end state is correct, but crm_mon shows a failed action that you need
> to clear out:
>
> Failed actions:
>      editshare.stack.7c645b0e-46bb-407e-b48a-92ec3121f2d7.lizardfs-master.primitive_demote_0
> (node=es-efs-master2, call=73, rc=1, status=complete, l
> ast-rc-change=Thu Aug 20 12:52:21 2015
> , queued=54ms, exec=1ms
> ): unknown error
>
> I'm curious about whether the behavior of my resource agent is
> correct. Should I not be returning OCF_FAILED_MASTER upon the
> "monitor" operation if the resource isn't started?

Correct. If resource is not started it cannot be master or slave; it can 
become master only after pacemaker requested it. Unexpected master would 
be just the same error as well.

If you can determine that one resource instance is more suitable to 
become master than another one, you should set master score respectively 
so pacemaker will promote correct instance.

>                                                    Or should the
> "demote" operation do something different in this state, like actually
> starting up the slave?
>

In general, if current resource state is the same as would be after 
operation is completed, there is absolutely no reason to return error - 
just pretend operation succeeded.

> It seems like the behavior of Pacemaker is different than what's
> documented in the resource agent guide, so I'm trying to figure out if
> this is a bug in my resource agent, a bug in Pacemaker, a
> misunderstanding on my part, or actually intended behavior.
>
> -- Brian
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>