[ClusterLabs] Antw: Re: Is fencing really a must for Postgres failover?

Wed Feb 13 10:29:52 EST 2019

Hi!

I wonder: Can we close this thread with "You have been warned, so please don't
come back later, crying! In the meantime you can do what you want to do."?

Regards,
Ulrich

>>> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am 13.02.2019 um
15:05 in
Nachricht <20190213150549.47634671 at firost>:
> On Wed, 13 Feb 2019 13:50:17 +0100
> Maciej S <internet at swierki.com> wrote:
> 
>> Can you describe at least one situation when it could happen?
>> I see situations where data on two masters can diverge but I can't find
the
>> one where data gets corrupted. Or maybe you think that some kind of
>> restoration is required in case of diverged data, but this is not my use
>> case (I can live with a loss of some data on one branch and recover it
from
>> working master).
> 
> With imagination and some "if", we can describe some scenario, but chaos is

> much
> more creative than me. But anyway, bellow is a situation:
> 
>   PostgreSQL doesn't do sanity check when starting as a standby and catching

> up
>   with a primary. If your old primary crashed and catch up with the new one
>   without some housecleaning first by a human (rebuilding it or using
>   pg_rewind), it will be corrupted.
> 
> Please, do not leave on a public mailing list dangerous assumptions like
> "fencing is like for additional precaution". It is not, in a lot a 
> situation,
> PostgreSQL included.
> 
> I know there is use cases where extreme-HA-failure-coverage is not
required.
> Typically, implementing 80% of the job is enough or just make sure the 
> service
> is up, no matter the data loss. In such case, maybe you can avoid the 
> complexity
> of a "state of the art full HA stack with seat-belt helmet and parachute" 
> and
> have something cheaper.
> 
> As instance, Patroni is a very good alternative, but a PostgreSQL-only 
> solution.
> At least, it has the elegance to use an external DCS for Quorum and Watchdog

> as
> fencing-of-the-poor-man and self-fencing solution.
> 
> 
>> śr., 13 lut 2019 o 13:10 Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
>> napisał(a):
>> 
>> > On Wed, 13 Feb 2019 13:02:30 +0100
>> > Maciej S <internet at swierki.com> wrote:
>> >  
>> > > Thank you all for the answers. I can see your point, but anyway it
seems
>> > > that fencing is like for additional precaution.  
>> >
>> > It's not.
>> >  
>> > > If my requirements allow some manual intervention in some cases (eg.
>> > > unknown resource state after failover), then I might go ahead without
>> > > fencing. At least until STONITH is not mandatory :)  
>> >
>> > Well, then soon or later, we'll talk again about how to quickly restore
>> > your
>> > service and/or data. And the answer will be difficult to swallow.
>> >
>> > Good luck :)
>> >  
>> > > pon., 11 lut 2019 o 17:54 Digimer <lists at alteeve.ca> napisał(a):
>> > >  
>> > > > On 2019-02-11 6:34 a.m., Maciej S wrote:  
>> > > > > I was wondering if anyone can give a plain answer if fencing is  
>> > really  
>> > > > > needed in case there are no shared resources being used (as far as
I
>> > > > > define shared resource).
>> > > > >
>> > > > > We want to use PAF or other Postgres (with replicated data files on

>> > the  
>> > > > > local drives) failover agent together with Corosync, Pacemaker and
>> > > > > virtual IP resource and I am wondering if there is a need for
fencing
>> > > > > (which is very close bind to an infrastructure) if a Pacemaker is
>> > > > > already controlling resources state. I know that in failover case 

>> > there  
>> > > > > might be a need to add functionality to recover master that
entered
>> > > > > dirty shutdown state (eg. in case of power outage), but I can't see

>> > any  
>> > > > > case where fencing is really necessary. Am I wrong?
>> > > > >
>> > > > > I was looking for a strict answer but I couldn't find one...
>> > > > >
>> > > > > Regards,
>> > > > > Maciej  
>> > > >
>> > > > Fencing is as required as a wearing a seat belt in a car. You can
>> > > > physically make things work, but the first time you're "in an  
>> > accident",  
>> > > > you're screwed.
>> > > >
>> > > > Think of it this way;
>> > > >
>> > > > If services can run in two or more places at the same time without
>> > > > coordination, you don't need a cluster, just run things everywhere.
If
>> > > > you need coordination though, you need fencing.
>> > > >
>> > > > The role of fencing is to force a node that has entered into an
unknown
>> > > > state and force it into a known state. In a system that requires
>> > > > coordination, often times fencing is the only way to ensure sane  
>> > operation.  
>> > > >
>> > > > Also, with pacemaker v2, fencing (stonith) became mandatory at a
>> > > > programmatic level.
>> > > >
>> > > > --
>> > > > Digimer
>> > > > Papers and Projects: https://alteeve.com/w/ 
>> > > > "I am, somehow, less interested in the weight and convolutions of
>> > > > Einstein’s brain than in the near certainty that people of equal
talent
>> > > > have lived and died in cotton fields and sweatshops." - Stephen Jay 

>> > Gould  
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org