[ClusterLabs] fence_mpath and failed IP

Strahil Nikolov hunter86_bg at yahoo.com
Tue Mar 31 00:31:57 EDT 2020


On March 31, 2020 5:56:26 AM GMT+03:00, Ken Gaillot <kgaillot at redhat.com> wrote:
>On Sat, 2020-02-22 at 03:50 +0200, Strahil Nikolov wrote:
>> Hello community,
>> 
>> Recently I have started playing with fence_mpath and I have noticed
>> that when the node is fenced,  the node is kicked out of the
>> cluster  (corosync & pacemaker are shut down).
>> 
>> Fencing works correctly , but the IP address cannot be brought up on
>> the designated 'replacement' host, because it was left on the old
>> node.
>> 
>> I believe that this is a timing issue -  fenced node doesn't have the
>> time to shutdown all it's resources before pacemaker dies locally.
>> 
>> Can someone confirm this behaviour on anothger distro,  as I'm
>> currently testing it on RHEL7? If it is only for RedHat,  I can open
>> a bug in the bugzilla.
>> 
>> Note: There is a workaround in order to reboot the node (using
>> a  symbolic link to /etc/watchdog.d )  with the help of the
>> fence_scsi or the fence_mpath scripts  in /usr/share/cluster .
>> 
>> 
>> Best Regards,
>> Strahil Nikolov
>
>I'm not expert with fabric fencing, but from what I understand, this is
>an inherent limitation. Cutting off the disk obviously has no effect on
>resources (like an IP) that don't require that disk.
>
>Pacemaker 2.0.3 added a new cluster property, "fence-reaction", that
>controls what a node does when notified of its own fencing. That's
>intended for cases like this (though it only is useful if the node is
>still functioning well enough to process the notification). The default
>of "stop" is pacemaker's traditional response -- immediately stop
>pacemaker itself, which can leave resources running. Using "panic" will
>make pacemaker halt the node instead.
>
>In theory, the ideal solution would be to use a fencing topology to
>combine disk fencing with network access fencing via a smart switch.
>However there is a bug with that setup.
>
>I'm not sure what people have traditionally done about the problem.

Hey Ken,
Thanks for your reply.
I found out that the fence_mpath is providing a script which can be used with the watchdog.service and power cycles the node  if the reservation keys are gone.

Actually it turned out to be quite effective.

The option 'panic' seems a good option.

Best Regards,
Strahil Nikolov


More information about the Users mailing list