[ClusterLabs] crm node stays online after issuing node standby command
    Ken Gaillot 
    kgaillot at redhat.com
       
    Wed Mar 15 10:46:31 EDT 2023
    
    
  
Hi,
If you can reproduce the problem, the following info would be helpful:
* "cibadmin -Q | grep standby" : to show whether it was successfully
recorded in the CIB (will show info for any node with standby, but the
XML ID likely has the node name or ID in it)
* "attrd_updater -Q -n standby -N FILE-2" : to show whether the
attribute manager has the right value in memory for the affected node
On Wed, 2023-03-15 at 15:51 +0530, Ayush Siddarath wrote:
> Hi All, 
> 
> We are seeing an issue as part of crm maintenance operations. As part
> of the upgrade process, the crm nodes are put into standby mode. 
> But it's observed that one of the nodes fails to go into standby mode
> despite the "crm node standby" returning success. 
> 
> Commands issued to put nodes into maintenance : 
> 
> > [2023-03-15 06:07:08 +0000] [468] [INFO] changed: [FILE-1] =>
> > {"changed": true, "cmd": "/usr/sbin/crm node standby FILE-1",
> > "delta": "0:00:00.442615", "end": "2023-03-15 06:07:08.150375",
> > "rc": 0, "start": "2023-03-15 06:07:07.707760", "stderr": "",
> > "stderr_lines": [], "stdout": "\u001b[32mINFO\u001b[0m: standby
> > node FILE-1", "stdout_lines": ["\u001b[32mINFO\u001b[0m: standby
> > node FILE-1"]}
> > .
> > [2023-03-15 06:07:08 +0000] [468] [INFO] changed: [FILE-2] =>
> > {"changed": true, "cmd": "/usr/sbin/crm node standby FILE-2",
> > "delta": "0:00:00.459407", "end": "2023-03-15 06:07:08.223749",
> > "rc": 0, "start": "2023-03-15 06:07:07.764342", "stderr": "",
> > "stderr_lines": [], "stdout": "\u001b[32mINFO\u001b[0m: standby
> > node FILE-2", "stdout_lines": ["\u001b[32mINFO\u001b[0m: standby
> > node FILE-2"]}
> 
>       ........ 
> 
> Crm status o/p after above command execution: 
> 
> > FILE-2:/var/log # crm status
> > Cluster Summary:
> >   * Stack: corosync
> >   * Current DC: FILE-1 (version 2.1.2+20211124.ada5c3b36-
> > 150400.2.43-2.1.2+20211124.ada5c3b36) - partition with quorum
> >   * Last updated: Wed Mar 15 08:32:27 2023
> >   * Last change:  Wed Mar 15 06:07:08 2023 by root via cibadmin on
> > FILE-4
> >   * 4 nodes configured
> >   * 11 resource instances configured (5 DISABLED)
> > Node List:
> >   * Node FILE-1: standby (with active resources)
> >   * Node FILE-3: standby (with active resources)
> >   * Node FILE-4: standby (with active resources)
> >   * Online: [ FILE-2 ]
> 
> pacemaker logs indicate that FILE-2 received the commands to put it
> into standby. 
> 
> > FILE-2:/var/log # grep standby /var/log/pacemaker/pacemaker.log
> > Mar 15 06:07:08.098 FILE-2 pacemaker-based     [8635]
> > (cib_perform_op)  info: ++                                        
> >    <nvpair id="num-1-instance_attributes-standby" name="standby"
> > value="on"/>
> > Mar 15 06:07:08.166 FILE-2 pacemaker-based     [8635]
> > (cib_perform_op)  info: ++                                        
> >    <nvpair id="num-3-instance_attributes-standby" name="standby"
> > value="on"/>
> > Mar 15 06:07:08.170 FILE-2 pacemaker-based     [8635]
> > (cib_perform_op)  info: ++                                        
> >    <nvpair id="num-2-instance_attributes-standby" name="standby"
> > value="on"/>
> > Mar 15 06:07:08.230 FILE-2 pacemaker-based     [8635]
> > (cib_perform_op)  info: ++                                        
> >    <nvpair id="num-4-instance_attributes-standby" name="standby"
> > value="on"/>
> 
> 
> Issue is quite intermittent and observed on other nodes as well. 
> We have seen a similar issue when we try to remove the node from
> standby mode (using crm node online) command. One/more nodes fails to
> get removed from standby mode. 
> 
> We suspect it could be an issue with parallel execution of node
> standby/online command for all nodes but this issue wasn't observed
> with pacemaker packaged with SLES15 SP2 OS. 
> 
> I'm attaching the pacemaker.log from FILE-2 for analysis. Let us know
> if any additional information is required. 
> 
> OS: SLES15 SP4
> Pacemaker version --> 
>  crmadmin --version
> Pacemaker 2.1.2+20211124.ada5c3b36-150400.2.43
> 
> Thanks,
> Ayush 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>
    
    
More information about the Users
mailing list