[ClusterLabs] Antw: [EXT] Re: Autostart/Enabling of Pacemaker and corosync

Tue Apr 27 02:48:44 EDT 2021

>>> damiano giuliani <damianogiuliani87 at gmail.com> schrieb am 26.04.2021 um
20:04
in Nachricht
<CAG=zYNMvYcTWL=or2-HUFwODqrEAaQbMQShQpke8Wrgi=8Hr3g at mail.gmail.com>:
> Personally i discourage the use of the auto restarts/rejoin, if something
> wrong happened, better investigate the causes and then enable the failed
> node again.

Well, actually I find it less stressing if the cluster is running, and you are
examining thelogs while the cluster is running.
Typically it takes an hour or more to analyze what was going on, especially if
the cluster fenced multiple times.
The situation may be different if some resource can't be started any more; so
immediate action is required, but usually some external events (like network
outrages, CPU load, full disks) cause the problems.

> failovers shouldnt occour frequently, only if something went really bad: as
> far i know, pacemaker and PAF doesnt support any kind of autoheal so it
> should a good thing check the causes before.

Checking the cluster periodically is even better: You can fix things before
they really cause a serious problem.

> This is just my opinion and way to work, probably someone more expert can
> join the conversation.

Regards,
Ulrich

> 
> Best,
> 
> Damiano
> 
> Il giorno lun 26 apr 2021 alle ore 19:04 Moneta, Howard <
> Howard.Moneta at csaa.com> ha scritto:
> 
>> Hello community.  I have read that it is not recommended to set Pacemaker
>> and corosync to enabled/auto start on the nodes.  Is this how people have
>> it configured? If a computer restarts unexpectedly, is it better to
>> manually investigate first or allow the node to come back online and
rejoin
>> the cluster automaticly in order to minimize downtime?  If the auto start
>> is not enabled, how do you handle patching?  I’m using Pacemaker with PAF,
>> PostgreSQL Automatic Failover. I had thought to follow the published
>> guidance and not set those processes to enabled but other coworkers are
>> resisting and saying that the systems should be configured to recover by
>> themselves around patching or even a temporary unplanned
>> network/virtualization glitch.
>>
>>
>>
>> Thanks,
>>
>> Howard
>>
>>
>>
>> This message may contain information, including personally identifiable
>> information that is confidential, privileged, or otherwise legally
>> protected. If you are not the intended recipient, please immediately
notify
>> the sender and delete this message without copying, disclosing, or
>> distributing it.
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
>>