[ClusterLabs] Pacemaker Fail Master-Master State

Ken Gaillot kgaillot at redhat.com
Mon Feb 22 12:39:13 EST 2021


On Sun, 2021-02-21 at 12:56 +0300, İsmet BALAT wrote:
> And this state can be. Master machine can down and anorher machine
> can be new master. Then first machine can be online with master.Like
> in video. This state, I cant fix because no internet. This machines
> will use a offline project. All states have work successfully

Fencing is necessary for the case where both nodes are up, but unable
to communicate with each other (network issue, CPU load, etc.). It is
also necessary for the case where one node is not working properly and
is unable to stop an active resource. Without fencing, both nodes could
promote to the master role, causing data inconsistencies or even loss.

If you can't get power fencing, there are alternatives.

If your nodes are physical machines with hardware watchdogs, you may be
able to use sbd, if you have either shared storage or a lightweight
third node that can run corosync-qdevice to provide true quorum. Or, if
you have shared SCSI storage, you may be able to use fence_scsi to
fence by cutting off disk access. Or, if you have an intelligent
network switch (that is, one with SNMP-based administration), you may
be able to use fence_snmp to fence by cutting off network access. Or,
if your nodes are virtual machines, and you have access to the host,
you may be able to use fence_virt or fence_xvm.



> 
> On 21 Feb 2021 Sun at 12:45 İsmet BALAT <bcalbatros at gmail.com> wrote:
> > I am testing aşk scenarios because I will use real machines with
> > pacemaker. Scenarios;
> > 
> > 1- 
> > node1  master
> > node2 slave 
> > Shutting node1, then node2 become master
> > Successfully
> > 
> > 2-
> > node1  slave
> > node2 master 
> > Shutting node2, then node1 become master
> > Successfully
> > 
> > 3-
> > node1  slave
> > node2 slave 
> > One node become master after 60s
> > Successfully
> > 
> > 4-
> > node1  master
> > node2 master 
> > First machine fail, and not fix unlike send command cleanup  
> > Fail
> > 
> > I haven’t got physical fencing device. But all machines must online
> > for redundancy. So I guess we don’t use fencing. Because servers
> > havent got connection for remote help and internet. They must fix
> > their:)
> > 
> > 
> > 
> > On 21 Feb 2021 Sun at 12:14 damiano giuliani <
> > damianogiuliani87 at gmail.com> wrote:
> > > My question is:
> > > Why you are pausing one VM?there is any specific scope in
> > > that?you should never have 2 master resources, pausing one vm
> > > could make unexpected behaviours.
> > > If you are testing failovers or simulated faults you must
> > > configure a fencing mechanism.
> > > Dont expect your cluster is working properly without it.
> > > 
> > > On Sun, 21 Feb 2021, 07:29 İsmet BALAT, <bcalbatros at gmail.com>
> > > wrote:
> > > > Sorry, I am in +3utc and was sleeping. I will try first fix
> > > > node, then start cluster. Thank you 
> > > > 
> > > > On 21 Feb 2021 Sun at 00:00 damiano giuliani <
> > > > damianogiuliani87 at gmail.com> wrote:
> > > > > resources configured in a master/slave mode
> > > > > If you got 2 masters something is not working right. You
> > > > > should never have 2 node in master.
> > > > > Disable pacemaker and corosync services to autostart on both
> > > > > nodes
> > > > > systemctl disable corosync
> > > > > Systemctl disable pacemaker
> > > > > 
> > > > > You can start the faulty node using pcs cli:
> > > > > pcs cluster start
> > > > > 
> > > > > You can start the whole cluster using
> > > > > pcs cluster start --all
> > > > > 
> > > > > First of all configure a fencing mechanism to make the
> > > > > cluster consistent. Its mandatory.
> > > > > 
> > > > > 
> > > > > 
> > > > > On Sat, 20 Feb 2021, 21:47 İsmet BALAT, <bcalbatros at gmail.com
> > > > > > wrote:
> > > > > > I am not using fence. If I disable pacemaker,how node join
> > > > > > cluster (for first example in video - master/slave
> > > > > > changing)? So I need a check script for fault states :( 
> > > > > > 
> > > > > > And thank you for reply 
> > > > > > 
> > > > > > On 20 Feb 2021 Sat at 23:40 damiano giuliani <
> > > > > > damianogiuliani87 at gmail.com> wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > Have you correcly configure a working fencing
> > > > > > > mechanism?without it you cant rely on a safe and
> > > > > > > consistent environment.
> > > > > > > My suggestion is to disable the autostart services (and
> > > > > > > so the autojoin into the cluster) on both nodes.
> > > > > > >  if there is a fault you have to investigate before you
> > > > > > > rejoin the old fault master node.
> > > > > > > Pacemaker (and paf if u are using it) as far i know, 
> > > > > > > doesnt support the autoheal of the old master, so you
> > > > > > > should resync or pg_rewind eveythime there is a fault.
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > On Sat, 20 Feb 2021, 19:03 İsmet BALAT, <
> > > > > > > bcalbatros at gmail.com> wrote:
> > > > > > > > I am using Pacemaker with Centos 8 and Postgresql 12.
> > > > > > > > Failover master/slave states successfully run. But if
> > > > > > > > all nodes are masters, pacemaker can't repair its
> > > > > > > > unlikely send command 'pcs resources cleanup'. Wheras I
> > > > > > > > set 60s in resource config. How can I fix it?
> > > > > > > > 
> > > > > > > > StackOverFlow link: 
> > > > > > > > https://stackoverflow.com/questions/66292304/pacemaker-postgresql-master-master-state
> > > > > > > > 
> > > > > > > > Thanks
> > > > > > > > 
> > > > > > > > İsmet BALAT
> > > > > > > > _______________________________________________
> > > > > > > > Manage your subscription:
> > > > > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > > > > > 
> > > > > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > Manage your subscription:
> > > > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > > > > 
> > > > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > > > 
> > > > > > _______________________________________________
> > > > > > Manage your subscription:
> > > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > > > 
> > > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > > 
> > > > > _______________________________________________
> > > > > Manage your subscription:
> > > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > > 
> > > > > ClusterLabs home: https://www.clusterlabs.org/
> > > > 
> > > > _______________________________________________
> > > > Manage your subscription:
> > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > 
> > > > ClusterLabs home: https://www.clusterlabs.org/
> > > 
> > > _______________________________________________
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > 
> > > ClusterLabs home: https://www.clusterlabs.org/
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list