[ClusterLabs] Tuchanka

Fri Oct 2 10:19:23 EDT 2020

On 10/2/20 3:15 PM, Jehan-Guillaume de Rorthais wrote:
> On Fri, 2 Oct 2020 15:18:18 +0300
> Олег Самойлов <splarv at ya.ru> wrote:
>
>>> On 29 Sep 2020, at 11:34, Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
>>> wrote:
>>>
>>>
>>> Vagrant use virtualbox by default, which supports softdog, but it support
>>> many other virtualization plateform, including eg. libvirt/kvm where you
>>> can use virtualized watchdog card.
>>>   
>>>>   
>>> Vagrant can use Chef, Ansible, Salt, puppet, and others to provision VM:
>>>
>>>  https://www.vagrantup.com/docs/provisioning
>>>
>>>
>>> There many many available vagrant images:
>>> https://app.vagrantup.com/boxes/search There's many vagrant image...because
>>> building vagrant image is easy. I built some when RH8 wasn't available yet.
>>> So if you need special box, with eg. some predefined setup, you can do it
>>> quite fast.  
>> My english is poor, I'll try to find other words. My primary and main task
>> was to create a prototype for an automatic deploy system. So I used only the
>> same technique that will be used on the real hardware servers: RedHat dvd
>> image + kickstart. And to test such deploying too. That's why I do not use
>> any special image for virtual machines.
> How exactly using a vagrant box you built yourself is different with
> virtualbox where you clone (I suppose) an existing VM you built?
>
>>> Watchdog is kind of a self-fencing method. Cluster with quorum+watchdog, or
>>> SBD+watchdog or quorum+SBD+watchdog are fine...without "active" fencing.  
>> quorum+watchdog or SBD+watchdog are useless. Quorum+SBD+watchdog is a
>> solution, but also has some drawback, so this is not perfect or fine yet.
> Well, by "SBD", I meant "Storage Based Death": using a shared storage to poison
> pill other nodes. Not just the sbd daemon, that is used for SBD and watchdog.
> Sorry for the shortcut and the confusion.
>
>> I'll write about it below.
>>   
>>>>> Now, in regard with your multi-site clusters and how you deal with it
>>>>> using quorum, did you read the chapter about the Cluster Ticket Registry
>>>>> in Pacemaker doc ? See:
>>>>>
>>>>> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Explained/ch15.html    
>>>> Yep, I read the whole documentation two years ago. Yep, the ticket system
>>>> was looked interesting at first glance, but I didn't see a method how to
>>>> use it with PAF. :)  
>>> It could be interesting to have detailed feedback about that. Could you
>>> share your experience?  
>> Heh, I don't have experience of using the ticket system because I can't even
>> imaging how to use the ticket system with PAF.
> OK
>
>> As about pacemaker without STONITH the idea was simple: quorum + SBD as
>> watchdog daemon.
> (this was what I describe as "quorum+watchdog", again sorry for the
> confusion :))
>
>> More precisely described in the README. Proved by my test
>> system this is mostly works. :)
>>
>> What are possible caveats. First of all softdog is not good for this (only
>> for testing), and system will heavily depend on reliability of the watchdog
>> device.
> +1
>
>> SBD is not good as watchdog daemon. In my version it does not check
>> that the corosync and any processes of the pacemaker are not frozen (for
>> instance by kill -STOP). Looked like checking for corosync have been already
>> done: https://github.com/ClusterLabs/sbd/pull/83
> Good.
>
>> Don't know what about checking all processes of the pacemaker.
> This moves toward the good direction I would say:
>
>   https://lists.clusterlabs.org/pipermail/users/2020-August/027602.html
>
> The main Pacemaker process is now checked by sbd. Maybe other processes will be
> included in futur releases as "more in-depth health checks" as written in this
> email.
We are targetting a hierarchical approach:

SBD is checking pacemakerd - more explicitly a timestamp
when pacemakerwas considered fine last time. So this task
of checking liveness of thewhole group of pacemaker
daemons can be passed over to pacemakerdwithout risking
that pacemakerd might be stalled or something.

Klaus
>
> Regards,
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/