[ClusterLabs] Antw: Re: Ocassionally IPaddr2 resource fails to start

Mon Oct 21 04:01:16 EDT 2019

>>> Donat Zenichev <donat.zenichev at gmail.com> schrieb am 21.10.2019 um 09:12
in
Nachricht
<CANLwQCn2MC60R9LpVqaz85w7-ozDvYKmgiNqjx-LZRXo+m=xuQ at mail.gmail.com>:
> Hello and sorry for soo late response of mine, I somehow missed your
answer.
> 
> Sure let me share a bit of useful information on the count.
> First of all the system specific things are:
> - Hypervisor is a usual VMware product - VSphere
> - VMs OS is: Ubuntu 18.04 LTS
> - Pacemaker is of version: 1.1.18-0ubuntu1.1
> 
> And yes it's IProute, that has a version - 4.15.0-2ubuntu1
> 
> To be mentioned that after I moved to another way of handling this (with
> set failure-timeout ) I haven't seen any errors so far, on-fail action
> still remains "restart".
> But it's obvious, failure-timeout just clears all fail counters for me, so
> I don't see any fails now.

Failures should be logged in logfiles still. failure-timeout also does not
prevent a restart on failure; it just extends the number of restart attempts.

> 
> Another thing to be mentioned, that monitor functionality for IPaddr2
> resource was failing in the years past as well, I just didn't pay much
> attention on that.
> That time VM machines under my control were working over Ubuntu 14.04 and
> hypervisor was - Proxmox of the branch 5+ (cannot exactly remember the
> version, perhaps that was 5.4+).
> 
> For one this could be a critical case indeed, since sometimes an absence of
> IP address (for a certain DB for e.g. with loading of hundreds of thousands
> SQL requests) can lead to a huge out age.
> I don't have the first idea of how to investigate this further. But, I have
> a staging setup where my hands are not tied, so let me know if we can
> research something.

We had a similar case for the NFS server, and I added a script that does the
same monitoring as the RA, but logs what the command outputs in case the output
changed. Unfortunately I did not see the error since I added the script ;-)

> 
> And have a nice day!

Regards,
Ulrich

> 
> On Mon, Oct 7, 2019 at 7:21 PM Jan Pokorný <jpokorny at redhat.com> wrote:
> 
>> Donat,
>>
>> On 07/10/19 09:24 -0500, Ken Gaillot wrote:
>> > If this always happens when the VM is being snapshotted, you can put
>> > the cluster in maintenance mode (or even unmanage just the IP
>> > resource) while the snapshotting is happening. I don't know of any
>> > reason why snapshotting would affect only an IP, though.
>>
>> it might be interesting if you could share the details to grow the
>> shared knowledge and experience in case there are some instances of
>> these problems reported in the future.
>>
>> In particular, it'd be interesting to hear:
>>
>> - hypervisor
>>
>> - VM OS + if plain oblivious to running virtualized,
>>   or "the optimal arrangement" (e.g., specialized drivers, virtio,
>>   "guest additions", etc.)
>>
>> (I think IPaddr2 is iproute2-only, hence in turn, VM OS must be Linux)
>>
>> Of course, there might be more specific things to look at if anyone
>> here is an expert with particular hypervisor technology and the way
>> the networking works with it (no, not me at all).
>>
>> --
>> Poki
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
> 
> 
> 
> -- 
> 
> Best regards,
> Donat Zenichev