[Pacemaker] Problem with dual-PDU fencing node with redundant PSUs

Digimer lists at alteeve.ca
Thu Jun 27 09:54:13 EDT 2013


On 06/27/2013 07:02 AM, Dejan Muhamedagic wrote:
> Hi,
> 
> On Wed, Jun 26, 2013 at 03:52:00PM -0400, Digimer wrote:
>> This question appears to be the same issue asked here:
>>
>> http://oss.clusterlabs.org/pipermail/pacemaker/2013-June/018650.html
>>
>> In my case, I have two fence methods per node; IPMI first with
>> action="reboot" and, if that fails, two PDUs (one backing each side of
>> the node's redundant PSUs).
>>
>> Initially I setup the PDUs as action "reboot" figuring that the
>> fence_toplogy tied them together, so pacemaker would call "pdu1:port1;
>> off -> pdu2:port1; off; (verify both are off) -> pdu1:port1; on ->
>> pdu2:port1; on".
>>
>> This didn't happen though. It called 'pdu1:port1; reboot' then
>> "pdu2:port1; reboot", so the first PSU in the node had it's power back
>> before the second PSU lost power, meaning the node never powered off.
> 
> I'm not sure if that's supported.

Unless I am misunderstood, beekhof indicated that it is/should be.

>> So next I tried;
>>
>> pdu1:port1; off -> pdu2:port1; off -> pdu1:port1; on -> pdu1:port1; on
>>
>> However, this seemed to have actually done;
>>
>> pdu1:port1; reboot -> pdu2:port1; reboot -> pdu1:port1; reboot ->
>> pdu1:port1; reboot
>>
>> So again, the node never lost power to both PSUs at the same time, so
>> the node didn't power off.
>>
>> This makes PDU fencing unreliable. I know beekhof said:
>>
>>   "My point would be that action=off is not the correct way to configure
>> what you're trying to do."
>>
>> in the other thread, but there was no elaborating on what *is* the right
>> way. So if neither approach works, what is the proper way for configure
>> PDU fencing when you have two different PDUs backing either PSU?
> 
> The fence action needs to be defined in the cluster properties
> (crm_config/cluster_property_set in XML):
> 
> # crm configure property stonith-action=off
> 
> See the output of:
> 
> $ crm ra info pengine
> 
> for the PE metadata and explanation of properties.

In irc last night, beekhof mentioned that action="..." is ignored and
replaced. However, it would appear that pcmk_reboot_action="..." should
force the issue. I'm planning to test this today.

>>   I don't want to disable "reboot" globally because I still want the
>> IPMI based fencing to do action="reboot".
> 
> I don't think it is possible to define a per-resource fencing
> action.
> 
>> If I just do "off", then the
>> node will not power back on after a successful fence. This is better
>> than nothing, but still quite sub-optimal.
> 
> Yes, if you want to start the cluster stack automatically on
> reboot. Anyway, I think that it would be preferred to let a human
> check why the node got fenced before letting it join the cluster
> again. In that case, one just needs to boot the host manually.
> 
> Thanks,
> 
> Dejan

I don't want the cluster stack to start on boot, so I disable
pacemaker/corosync. However, I do want the node to power back on so that
I can log into it when the alarms go off. Yes, I could log into the good
node, manually unfence/boot it and then log in, but this adds minutes to
the MTTR that I would realllly like to avoid.

cheers

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?




More information about the Pacemaker mailing list