[ClusterLabs] 答复: the PAF switchover does not happen if the VIP resource is stopped

Thu Apr 26 08:41:22 UTC 2018

Does it mean if one node has ever a resource failure, it could not be promoted to be master any more except that I run the pcs cleanup to clean the failcount?

I am testing the case if the VIP resource down because of some reason, the cluster could still work. So I only ifdown the VIP network(enp0s3), not the heartbeat network card(enp0s8)? 

-----邮件原件-----
发件人: Jehan-Guillaume de Rorthais [mailto:jgdr at dalibo.com] 
发送时间: 2018年4月26日 16:02
收件人: 范国腾 <fanguoteng at highgo.com>
抄送: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>; 李梦怡 <limengyi at highgo.com>
主题: Re: [ClusterLabs] the PAF switchover does not happen if the VIP resource is stopped

On Thu, 26 Apr 2018 07:53:07 +0000
范国腾 <fanguoteng at highgo.com> wrote:

> 1. There is no failure in initial status. sds1 is master
> 
> [cid:image001.png at 01D3DD75.3F4BF110]

yes.

> 2. ifdown the sds1 VIP network card.
> 
> [cid:image002.png at 01D3DD75.71D5DE70]

ok, failcount and -inf score appears.

> 3. ifup the sds1 VIP network card and then ifdown sds2 VIP network 
> card
> 
> [cid:image003.png at 01D3DD76.26C5E820]

Now failcount and -inf score everywhere.

I'm not sure I understand your mail, do you have a question ?

> -----邮件原件-----
> 发件人: Jehan-Guillaume de Rorthais [mailto:jgdr at dalibo.com]
> 发送时间: 2018年4月26日 15:07
> 收件人: 范国腾 <fanguoteng at highgo.com>
> 抄送: Cluster Labs - All topics related to open-source clustering 
> welcomed <users at clusterlabs.org>; 李梦怡 <limengyi at highgo.com> 主题: Re: 
> [ClusterLabs] the PAF switchover does not happen if the VIP resource 
> is stopped
> 
> 
> 
> On Thu, 26 Apr 2018 02:53:33 +0000
> 
> 范国腾 <fanguoteng at highgo.com<mailto:fanguoteng at highgo.com>> wrote:
> 
> 
> 
> > Hi Rorthais，
> 
> >  
> 
> > Thank you for your help.  
> 
> >  
> 
> > The replication works at that time.  
> 
> >  
> 
> > I try again today.  
> 
> > (1) If I run "ifup enp0s3" in node2, then run "ifdown enp0s3" in
> 
> > node1, the switchover issue could be reproduced. (2) But if I run
> 
> > "ifup enp0s3" in node2, run "pcs resource cleanup mastergroup" to
> 
> > clean the VIP resource, and there is no Failed Actions in "pcs
> 
> > status", then run "ifdown enp0s3" in node1, it works. The switchover 
> > could happened again.
> 
> >  
> 
> >  
> 
> > Is there any parameter to control this behaviors so that I don't 
> > need
> 
> > to execute the "pcs cleanup" command every time?  
> 
> 
> 
> Check the failcounts for each resource on each nodes (pcs resource 
> failcount [...]).
> 
> Check the scores as well (crm_simulate -sL).
> 
> 
> 
> >  
> 
> > -----邮件原件-----
> 
> > 发件人: Jehan-Guillaume de Rorthais [mailto:jgdr at dalibo.com]
> 
> > 发送时间: 2018年4月25日 18:39
> 
> > 收件人: 范国腾 <fanguoteng at highgo.com<mailto:fanguoteng at highgo.com>>
> 
> > 抄送: Cluster Labs - All topics related to open-source clustering
> 
> > welcomed <users at clusterlabs.org<mailto:users at clusterlabs.org>>; 李梦怡
> > <limengyi at highgo.com<mailto:limengyi at highgo.com>> 主题: Re:  
> 
> > [ClusterLabs] the PAF switchover does not happen if the VIP resource
> 
> > is stopped
> 
> >  
> 
> >  
> 
> > On Wed, 25 Apr 2018 08:58:34 +0000
> 
> > 范国腾 <fanguoteng at highgo.com<mailto:fanguoteng at highgo.com>> wrote:  
> 
> >  
> 
> > >  
> 
> > > Our lab has two resource: (1) PAF (master/slave)    (2) VIP (bind to the  
> 
> > > master PAF node). The configuration is in the attachment.  
> 
> > >  
> 
> > > Each node has two network card: One(enp0s8) is for the pacemaker
> 
> > > heartbeat in internal network, the other(enp0s3) is for the master
> 
> > > VIP in the external network.  
> 
> > >  
> 
> > >  
> 
> > >  
> 
> > > We are testing the following case: if the master VIP network card 
> > > is
> 
> > > down, the master postgres and VIP could switch to another node.  
> 
> > >  
> 
> > >  
> 
> > >  
> 
> > > 1. At first, node2 is master, I run "ifdown enp0s3" in node2, then
> 
> > > node1 become the master, that is ok.  
> 
> > >  
> 
> > > 2. Then I run "ifup enp0s3" in node2, wait for 60 seconds,
> 
> >  
> 
> > Did you check PostgreSQL instances were replicating again?  
> 
> >  
> 
> > > then run "ifdown enp0s3" in node1, but the node1 still be master.  
> 
> > > Why does switchover doesn't happened? How to recover to make 
> > > system work?

-- 
Jehan-Guillaume de Rorthais
Dalibo