[ClusterLabs] Antw: Re: Is fencing really a must for Postgres failover?

Wed Feb 13 10:37:01 EST 2019

On 02/13/2019 04:29 PM, Ulrich Windl wrote:
> Hi!
>
> I wonder: Can we close this thread with "You have been warned, so please don't
> come back later, crying! In the meantime you can do what you want to do."?

I think something like the answer of digimer is the better and
more general advise:

If you think you don't need fencing then you probably don't need a
cluster (or you missed something ;-) ).

Klaus

>
> Regards,
> Ulrich
>
>>>> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am 13.02.2019 um
> 15:05 in
> Nachricht <20190213150549.47634671 at firost>:
>> On Wed, 13 Feb 2019 13:50:17 +0100
>> Maciej S <internet at swierki.com> wrote:
>>
>>> Can you describe at least one situation when it could happen?
>>> I see situations where data on two masters can diverge but I can't find
> the
>>> one where data gets corrupted. Or maybe you think that some kind of
>>> restoration is required in case of diverged data, but this is not my use
>>> case (I can live with a loss of some data on one branch and recover it
> from
>>> working master).
>> With imagination and some "if", we can describe some scenario, but chaos is
>> much
>> more creative than me. But anyway, bellow is a situation:
>>
>>   PostgreSQL doesn't do sanity check when starting as a standby and catching
>> up
>>   with a primary. If your old primary crashed and catch up with the new one
>>   without some housecleaning first by a human (rebuilding it or using
>>   pg_rewind), it will be corrupted.
>>
>> Please, do not leave on a public mailing list dangerous assumptions like
>> "fencing is like for additional precaution". It is not, in a lot a 
>> situation,
>> PostgreSQL included.
>>
>> I know there is use cases where extreme-HA-failure-coverage is not
> required.
>> Typically, implementing 80% of the job is enough or just make sure the 
>> service
>> is up, no matter the data loss. In such case, maybe you can avoid the 
>> complexity
>> of a "state of the art full HA stack with seat-belt helmet and parachute" 
>> and
>> have something cheaper.
>>
>> As instance, Patroni is a very good alternative, but a PostgreSQL-only 
>> solution.
>> At least, it has the elegance to use an external DCS for Quorum and Watchdog
>> as
>> fencing-of-the-poor-man and self-fencing solution.
>>
>>
>>> śr., 13 lut 2019 o 13:10 Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
>>> napisał(a):
>>>
>>>> On Wed, 13 Feb 2019 13:02:30 +0100
>>>> Maciej S <internet at swierki.com> wrote:
>>>>  
>>>>> Thank you all for the answers. I can see your point, but anyway it
> seems
>>>>> that fencing is like for additional precaution.  
>>>> It's not.
>>>>  
>>>>> If my requirements allow some manual intervention in some cases (eg.
>>>>> unknown resource state after failover), then I might go ahead without
>>>>> fencing. At least until STONITH is not mandatory :)  
>>>> Well, then soon or later, we'll talk again about how to quickly restore
>>>> your
>>>> service and/or data. And the answer will be difficult to swallow.
>>>>
>>>> Good luck :)
>>>>  
>>>>> pon., 11 lut 2019 o 17:54 Digimer <lists at alteeve.ca> napisał(a):
>>>>>  
>>>>>> On 2019-02-11 6:34 a.m., Maciej S wrote:  
>>>>>>> I was wondering if anyone can give a plain answer if fencing is  
>>>> really  
>>>>>>> needed in case there are no shared resources being used (as far as
> I
>>>>>>> define shared resource).
>>>>>>>
>>>>>>> We want to use PAF or other Postgres (with replicated data files on
>  
>>>> the  
>>>>>>> local drives) failover agent together with Corosync, Pacemaker and
>>>>>>> virtual IP resource and I am wondering if there is a need for
> fencing
>>>>>>> (which is very close bind to an infrastructure) if a Pacemaker is
>>>>>>> already controlling resources state. I know that in failover case 
>>>> there  
>>>>>>> might be a need to add functionality to recover master that
> entered
>>>>>>> dirty shutdown state (eg. in case of power outage), but I can't see
>  
>>>> any  
>>>>>>> case where fencing is really necessary. Am I wrong?
>>>>>>>
>>>>>>> I was looking for a strict answer but I couldn't find one...
>>>>>>>
>>>>>>> Regards,
>>>>>>> Maciej  
>>>>>> Fencing is as required as a wearing a seat belt in a car. You can
>>>>>> physically make things work, but the first time you're "in an  
>>>> accident",  
>>>>>> you're screwed.
>>>>>>
>>>>>> Think of it this way;
>>>>>>
>>>>>> If services can run in two or more places at the same time without
>>>>>> coordination, you don't need a cluster, just run things everywhere.
> If
>>>>>> you need coordination though, you need fencing.
>>>>>>
>>>>>> The role of fencing is to force a node that has entered into an
> unknown
>>>>>> state and force it into a known state. In a system that requires
>>>>>> coordination, often times fencing is the only way to ensure sane  
>>>> operation.  
>>>>>> Also, with pacemaker v2, fencing (stonith) became mandatory at a
>>>>>> programmatic level.
>>>>>>
>>>>>> --
>>>>>> Digimer
>>>>>> Papers and Projects: https://alteeve.com/w/ 
>>>>>> "I am, somehow, less interested in the weight and convolutions of
>>>>>> Einstein’s brain than in the near certainty that people of equal
> talent
>>>>>> have lived and died in cotton fields and sweatshops." - Stephen Jay 
>>>> Gould  
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>