[ClusterLabs] Unfencing cause resource restarts

Tue Oct 11 15:33:12 UTC 2016

11.10.2016 17:40, Ken Gaillot:
> On 10/11/2016 07:06 AM, Pavel Levshin wrote:
>> Hi!
>>
>>
>> In continuation of prevoius mails, now I have more complex setup. Our
>> hardware are capable of two STONITH methods: ILO and SCSI persistent
>> reservations on shared storage. First method works fine, nevertheless,
>> sometimes in the past we faced problems with inaccessible ILO devices or
>> something... So, we would like to have SCSI fencing as an additional method.
>>
>> The problem: when a node 2 recovers, some resources are just stopped and
>> restarted on node 1. As far as I understand, primitive resources are
>> affected, but clone instances are not affected.
>>
>> In the example below, when bvnode2 recovers, vm_smartbv1 is restarted on
>> bvnode1, and vm_smartbv2 is live-migrated without interruption to
>> bvnode2. All other resources are clones working on bvnode1 and they are
>> unaffected.
>>
>> If I set "meta requires=fencing" for vm resources, they are not
>> restarted anymore. But why unfencing of bvnode2 affects resources
>> running on bvnode1?
> That does seem odd.
>
> Something I notice in the config below is that only the ILO devices are
> listed in the fence topology, and the only fence level is "10". Valid
> indexes are 1 to 9, so this should have produced a log error about "Bad
> topology".
>
> If you want the storage fencing as a fallback in case ILO fails, you
> want the devices in two levels, e.g. level 1 = ILO, level 2 = storage.

There were levels 10 and 20 earlier, and this worked (aside from the 
problem with unwanted restarts). Docs say that fencing level are numeric 
and tried in ascending order, there is no visible restriction on those 
numbers. No errors about bad topology. Levels come to play when it is 
time to fence someone, which does not happen.

So I assume that levels have nothing to do with the problem. Now the 
topology is:

ilo.bvnode2    (stonith:fence_ilo4):   Started bvnode1

ilo.bvnode1 (stonith:fence_ilo4):   Started bvnode2

storage.bvnode1 (stonith:fence_mpath):  Started bvnode1

storage.bvnode2 (stonith:fence_mpath):  Started bvnode2

Node: bvnode1

   Level 10 - ilo.bvnode1

   Level 20 - storage.bvnode1

Node: bvnode2

   Level 10 - ilo.bvnode2

   Level 20 - storage.bvnode2

--
Pavel Levshin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20161011/0abaff90/attachment-0002.html>