splarv at ya.ru
Fri Oct 2 08:18:18 EDT 2020
> On 29 Sep 2020, at 11:34, Jehan-Guillaume de Rorthais <jgdr at dalibo.com> wrote:
> Vagrant use virtualbox by default, which supports softdog, but it support many
> other virtualization plateform, including eg. libvirt/kvm where you can use
> virtualized watchdog card.
> Vagrant can use Chef, Ansible, Salt, puppet, and others to provision VM:
> There many many available vagrant images: https://app.vagrantup.com/boxes/search
> There's many vagrant image...because building vagrant image is easy. I built
> some when RH8 wasn't available yet. So if you need special box, with eg. some
> predefined setup, you can do it quite fast.
My english is poor, I'll try to find other words. My primary and main task was to create
a prototype for an automatic deploy system. So I used only the same technique that will
be used on the real hardware servers: RedHat dvd image + kickstart. And to test such deploying too.
That's why I do not use any special image for virtual machines.
> Watchdog is kind of a self-fencing method. Cluster with quorum+watchdog, or
> SBD+watchdog or quorum+SBD+watchdog are fine...without "active" fencing.
quorum+watchdog or SBD+watchdog are useless. Quorum+SBD+watchdog is a solution, but also has some drawback,
so this is not perfect or fine yet.
I'll write about it below.
>>> Now, in regard with your multi-site clusters and how you deal with it using
>>> quorum, did you read the chapter about the Cluster Ticket Registry in
>>> Pacemaker doc ? See:
>> Yep, I read the whole documentation two years ago. Yep, the ticket system was
>> looked interesting at first glance, but I didn't see a method how to use it
>> with PAF. :)
> It could be interesting to have detailed feedback about that. Could you share
> your experience?
Heh, I don't have experience of using the ticket system because I can't even imaging how to use the ticket system with PAF.
As about pacemaker without STONITH the idea was simple: quorum + SBD as watchdog daemon. More precisely described in the README.
Proved by my test system this is mostly works. :)
What are possible caveats. First of all softdog is not good for this (only for testing), and system will heavily depend on reliability of the watchdog device.
SBD is not good as watchdog daemon. In my version it does not check that the corosync and any processes of the pacemaker are not frozen (for instance by kill -STOP).
Looked like checking for corosync have been already done:
Don't know what about checking all processes of the pacemaker. Yep, this problems looked like artificial, but must be fixed.
There are other problems due to such solution was not heavily tested. For instance with default sync_timeout for the quorum device, this lead to both nodes will be rebooted: fault and healthy.
I don't know is this fixed in the mainstream.
More information about the Users