[Pacemaker] pacemaker dev lead suggesting s/w upgrade

Tue May 7 18:45:01 EDT 2013

Please keep all questions on the mailing list... I don't have the bandwidth for 1-1 support.
At a minimum, include (as attachments) logs from all machines.

Also, since you're upgrading, can I suggest you go with 1.1.10-rc2.
Even though its "only" a release candidate, its far superior to 1.1.8

-- Andrew

On 08/05/2013, at 12:12 AM, Vinod Prabhu <Vinod.Prabhu at ipaccess.com> wrote:

> Hi Andrew,
>  
> Greetings,
> We are using corosync/pacemaker for  high availability
> This is a 4 node HA cluster where each pair of nodes are configured  for DB and file system replication. So there are 2 drbd  pairs in total.
> I have attached the output of “crm configure show” for reference.
> The issue is observed after upgrading pacemaker and corosync version.
> Pacemaker: 1.1.5to 1.1.8
> Corosync:     1.2.7 to 1.4.1
>  
> We follow the following procedure to upgrade [the repo is downloaded from  wget -O /etc/yum.repos.d/pacemaker.repo http://clusterlabs.org/rpm-next/rhel-5/clusterlabs.repo]:
>  
> Crm configure save > local.conf
> crm node standby <on-each-node>
> service corosync stop <on-each-node>
> yum remove -y corosync
> yum remove -y corosync-debuginfo
> yum remove -y pacemaker-debuginfo
> yum remove -y heartbeat-libs
> yum remove -y heartbeat-debuginfo
> yum install -y pacemaker corosync
> cd /root/
> rpm -ivh crmsh-1.2.5-55.3.x86_64.rpm crmsh-debuginfo-1.2.5-55.3.x86_64.rpm
> service corosync start <on-each-node>
> crm node online <on-each-node>
> crm configure load replace local.conf
>  
> After the last step one of the node is frozen. Any help on this? Is there any other document u require? Or is the up-gradation step missing any ?
>  
> Vinod
>  
> From: Babu Challa 
> Sent: Tuesday, May 07, 2013 3:17 PM
> To: Vinod Prabhu
> Subject: FW: pacemaker dev lead suggesting s/w upgrade
>  
> FYI
>  
> R
> Babu Challa
> T: +44 (0) 1954 717972 | M: +44 (0) 7912 859958| E: babu.challa at ipaccess.com | W: www.ipaccess.com
> ip.access Ltd, Building 2020, Cambourne Business Park, Cambourne, Cambridge, CB23 6DW
>  
> The desire to excel is exclusive of the fact whether someone else appreciates it or not. "Excellence" is a drive from inside, not outside. Excellence is not for someone else to notice but for your own satisfaction and efficiency...
>  
> From: Babu Challa 
> Sent: 02 May 2013 10:27
> To: Vinod Prabhu; Michael van der Westhuizen; Dinesh Arney
> Cc: Karthik Ganesan; Gavin Stevens; Mandar Magikar
> Subject: pacemaker dev lead suggesting s/w upgrade
>  
> Hi All,
>  
> I have requested Andrew Beekhof,( team leader for Pacemaker development) for his input for the HA issues. He believes there was a bug on pacemaker and he is suggesting pacemaker upgrade .
>  
> Please find below email and his replay is in bold green
>  
> R
> Babu Challa
> T: +44 (0) 1954 717972 | M: +44 (0) 7912 859958| E: babu.challa at ipaccess.com | W: www.ipaccess.com
> ip.access Ltd, Building 2020, Cambourne Business Park, Cambourne, Cambridge, CB23 6DW
>  
> The desire to excel is exclusive of the fact whether someone else appreciates it or not. "Excellence" is a drive from inside, not outside. Excellence is not for someone else to notice but for your own satisfaction and efficiency...
>  
>  
> On 01/05/2013, at 11:55 PM, Babu Challa <Babu.Challa at ipaccess.com> wrote:
>  
> Hi Andrew,
> Thanks for the replay. Now I have managed to reproduce the issue. I am enclosing steps here for pacemaker team for their understanding . Requesting their advice for resolving this issue
>  
> Update your software.
>  
>  
> -----Original Message-----
> From: Andrew Beekhof [mailto:andrew at beekhof.net] 
> Sent: 01 May 2013 01:20
> To: Babu Challa
> Cc: The Pacemaker cluster resource manager
> Subject: Re: corosync restarts service when slave node joins the cluster
>  
> Hi Andrew,
>  
> Greetings,
>  
> We are using corosync/pacemaker for  high availability
>  
>  This is a 4 node HA cluster where each pair of nodes are configured  for DB and file system replication. We have very tricky situation. We have configured two clusters with exact same configuration on each. But on one cluster,  corosync restarting the services when slave node is rebooted and re-joins the cluster.
>  
> We have tried to reproduce the issue on other cluster with multiple HA  scenarios but no luck
>  
>  Few questions:
>  
>  1.       If rebooted slave is a  DC (designated Controller) , is there any possibility of this issue
> 2.       Is there any known issue in pacemaker version currently  we are using (1.1.5) which will be resolved if we upgrade to latest (1.8)
>  
> I believe there was one, check the ChangeLog
>  
> 3.       Is there any chance that pacemaker/corosync behaves differently even though configuration is same on each cluster
>  
> Timing issues do occur, how identical is the hardware?
>  
> 4.       Can you please let us know if there is any possible reason for this issue. That’s really helpful to reproduce this issue and fix it
>  
> More than likely it has been fixed in a later version.
>  
>  
>  Versions we are using;
>  
>  Pacemaker version - pacemaker-1.1.5
> Corosync version - corosync-1.2.7
> heartbeat-3.0.3-2.3
>  
>  R
> Babu Challa
> T: +44 (0) 1954 717972 | M: +44 (0) 7912 859958| E:
>  babu.challa at ipaccess.com | W: www.ipaccess.com ip.access Ltd, Building
>  2020, Cambourne Business Park, Cambourne, Cambridge, CB23 6DW
>  
>  The desire to excel is exclusive of the fact whether someone else appreciates it or not. "Excellence" is a drive from inside, not outside. Excellence is not for someone else to notice but for your own satisfaction and efficiency...
>  
> This message contains confidential information and may be privileged. If you are not the intended recipient, please notify the sender and delete the message immediately.
>  
> ip.access ltd, registration number 3400157, Building 2020, Cambourne
>  Business Park, Cambourne, Cambridge CB23 6DW, United Kingdom
>  
>  
> 
> 
> 
> 
> This message contains confidential information and may be privileged. If you are not the intended recipient, please notify the sender and delete the message immediately.
> 
> ip.access ltd, registration number 3400157, Building 2020, 
> Cambourne Business Park, Cambourne, Cambridge CB23 6DW, United Kingdom
> 
> 
> <nos.conf>