[Pacemaker] [RFC PATCH] Try to fix startup-fencing not happening

Andrew Beekhof andrew at beekhof.net
Mon Apr 4 05:39:44 EDT 2011

On Fri, Mar 25, 2011 at 11:41 AM, Simone Gotti <simone.gotti at gmail.com> wrote:
> On 03/25/2011 11:10 AM, Andrew Beekhof wrote:
>> On Thu, Mar 17, 2011 at 11:54 PM, Simone Gotti <simone.gotti at gmail.com> wrote:
>>> Hi,
>>> When using corosync + pcmk v1 starting both corosync and pacemakerd (and
>>> I think also using heartbeat or anything other than cman) as quorum
>>> provider, at startup in the CIB will not be a <node_state/> entry for
>>> the nodes that are not in cluster.
>> No, I'm pretty sure heartbeat has the same behavior.
> I didn't tested it bit if it works like cman then I think that
> startup-fencing won't work also on it. But this will be very strange.
>>> Instead when using cman as quorum provider there will be a <node_state>
>>> for every node known by cman as lib/common/ais.c:cman_event_callback
>>> calls crm_update_peer for every node reported by cman_get_nodes.
>> Yep
>>> Something similar will happen when using corosync+pcmkv1 if corosync is
>>> started on N nodes but pacemakerd is started only on N-M nodes.
>> Probably true.
>>> All of this will break 'startup-fencing' because, from my understanding,
>>> the logic is this:
>>> 1) At startup all the nodes are marked (in
>>> lib/pengine/unpack.c:unpack_node) as unclean.
>>> 2) lib/pengine/unpack.c:unpack_status will cycle only the available
>>> <node_state/> in the cib status section resetting them to a clean status
>>> at the start and then putting them as unclean if some conditions are met.
>>> 3) pengine/allocate.c:stage6 all the unclean nodes are fenced.
>>> In the above conditions you'll have a <node_state/> in the cib status
>>> section also for nodes without pacemakerd enabled and the startup
>>> fencing won't happen because there isn't any condition in unpack_status
>>> that will mark them as unclean.
>> But they're unclean by default... so the lack of a node_state
>> shouldn't affect that.
>> Or did you mean "clean" instead of "unclean"?
> The problem is not the lack of node state but the opposite, the presence
> of a node state also if the nodes that haven't joined the cluster. This
> happens with the current cman integration.
> The nodes known to pacemaker are all setted as unclean by default (point
> 1 above).
> But if their <node_state/> is available in the CIB, then in point 2 they
> will be set as clean (unclean=false) and no condition check in
> unpack_status will mark them as unclean=true again.

Ok, I understand what you're saying now.

>>> I'm not very expert of the code. I discarded the solution to not
>>> register at startup all the nodes known by cman but only the active ones
>>> as it won't fix the corosync+pcmkv1 case.
>>> Instead I tried to understand when a node that has its status in the cib
>>> should be startup fenced and a possible solution is in the attached patch.
>>> I noticed that when crm_update_peer inserts a new node this one doesn't
>>> have the expected attribute set. So if startup-fencing is enabled I'm
>>> going to set the node as expected up.
>> You lost me there... isn't this covered by just setting startup-fencing=false?
> I lost you too :D . The problem is that startup-fencing is not working.
> Anyway. This first patche is a sort of attempt to make startup-fencing
> work when in the CIB there are <node_state/> tags also for nodes not in
> the cluster. But it was a fast attempt that I don't like it as my
> intention was primarily to explain the actual problem. But probably I
> wasn't very clear in doing this. Sorry.
> In the mail a sent after this one, I tried to make a first step changing
> the behavior of the cman integration to make it work like the other
> implementations: add <node_state/> tag only for the hosts that joined
> the cluster.

That patch looks dangerous.

If A comes up and then B, then:
 A will have entries for A, B, C and D, but
 B will only have entries for A and B

Can you file a bug for this please?

More information about the Pacemaker mailing list