[ClusterLabs] crm node stays online after issuing node standby command

Ken Gaillot kgaillot at redhat.com
Wed Mar 15 11:46:50 EDT 2023


Don't worry about the attrd_updater, the standby is recorded as a
permanent node attribute so you'd use crm_attribute for that instead.
But from the CIB we can see that the attribute was not successfully
recorded, even though it logged that it was. That's concerning and may
indicate a regression. I'll try to reproduce it on my end.

On Wed, 2023-03-15 at 21:01 +0530, Ayush Siddarath wrote:
> Hi Ken, 
> 
> Somehow I didn't receive the email for your response. 
> 
> The system is currently in the same state and here are the required
> command outputs: 
> 
> > FILE-2:~ # cibadmin -Q | grep standby
> >           <nvpair id="num-1-instance_attributes-standby"
> > name="standby" value="on"/>
> >           <nvpair id="num-3-instance_attributes-standby"
> > name="standby" value="on"/>
> >           <nvpair id="num-4-instance_attributes-standby"
> > name="standby" value="on"/>
> 
> Running into some syntax issues when issuing the attrd_updater
> command. Could you review the commands? 
> 
> > FILE-2:~ # attrd_updater -Q --name="standby" -N FILE-3
> > Could not query value of standby: attribute does not exist
> > FILE-2:~ # attrd_updater -Q -n standby -N FILE-3
> > Could not query value of standby: attribute does not exist
> > FILE-2:~ # attrd_updater -Q -n standby -N FILE-2
> > Could not query value of standby: attribute does not exist
> 
> cibadmin -Q --> 
> 
>     </crm_config>
>     <nodes>
>       <node id="1" uname="FILE-1">
>         <instance_attributes id="num-1-instance_attributes">
>           <nvpair id="num-1-instance_attributes-standby"
> name="standby" value="on"/>
>         </instance_attributes>
>       </node>
>       <node id="2" uname="FILE-2"/>
>       <node id="3" uname="FILE-3">
>         <instance_attributes id="num-3-instance_attributes">
>           <nvpair id="num-3-instance_attributes-standby"
> name="standby" value="on"/>
>         </instance_attributes>
>       </node>
>       <node id="4" uname="FILE-4">
>         <instance_attributes id="num-4-instance_attributes">
>           <nvpair id="num-4-instance_attributes-standby"
> name="standby" value="on"/>
>         </instance_attributes>
>       </node>
> 
> After a few minutes, re-running the node standby command for the same
> node works fine. 
> 
> Thanks,
> Ayush 
> 
> On Wed, Mar 15, 2023 at 8:55 PM Priyanka Balotra <
> priyanka.14balotra at gmail.com> wrote:
> > +Ayush
> > 
> > Thanks
> > 
> > 
> > On Wed, 15 Mar 2023 at 8:17 PM, Ken Gaillot <kgaillot at redhat.com>
> > wrote:
> > > Hi,
> > > 
> > > If you can reproduce the problem, the following info would be
> > > helpful:
> > > 
> > > * "cibadmin -Q | grep standby" : to show whether it was
> > > successfully
> > > recorded in the CIB (will show info for any node with standby,
> > > but the
> > > XML ID likely has the node name or ID in it)
> > > 
> > > * "attrd_updater -Q -n standby -N FILE-2" : to show whether the
> > > attribute manager has the right value in memory for the affected
> > > node
> > > 
> > > 
> > > On Wed, 2023-03-15 at 15:51 +0530, Ayush Siddarath wrote:
> > > > Hi All, 
> > > > 
> > > > We are seeing an issue as part of crm maintenance operations.
> > > As part
> > > > of the upgrade process, the crm nodes are put into standby
> > > mode. 
> > > > But it's observed that one of the nodes fails to go into
> > > standby mode
> > > > despite the "crm node standby" returning success. 
> > > > 
> > > > Commands issued to put nodes into maintenance : 
> > > > 
> > > > > [2023-03-15 06:07:08 +0000] [468] [INFO] changed: [FILE-1] =>
> > > > > {"changed": true, "cmd": "/usr/sbin/crm node standby FILE-1",
> > > > > "delta": "0:00:00.442615", "end": "2023-03-15
> > > 06:07:08.150375",
> > > > > "rc": 0, "start": "2023-03-15 06:07:07.707760", "stderr": "",
> > > > > "stderr_lines": [], "stdout": "\u001b[32mINFO\u001b[0m:
> > > standby
> > > > > node FILE-1", "stdout_lines": ["\u001b[32mINFO\u001b[0m:
> > > standby
> > > > > node FILE-1"]}
> > > > > .
> > > > > [2023-03-15 06:07:08 +0000] [468] [INFO] changed: [FILE-2] =>
> > > > > {"changed": true, "cmd": "/usr/sbin/crm node standby FILE-2",
> > > > > "delta": "0:00:00.459407", "end": "2023-03-15
> > > 06:07:08.223749",
> > > > > "rc": 0, "start": "2023-03-15 06:07:07.764342", "stderr": "",
> > > > > "stderr_lines": [], "stdout": "\u001b[32mINFO\u001b[0m:
> > > standby
> > > > > node FILE-2", "stdout_lines": ["\u001b[32mINFO\u001b[0m:
> > > standby
> > > > > node FILE-2"]}
> > > > 
> > > >       ........ 
> > > > 
> > > > Crm status o/p after above command execution: 
> > > > 
> > > > > FILE-2:/var/log # crm status
> > > > > Cluster Summary:
> > > > >   * Stack: corosync
> > > > >   * Current DC: FILE-1 (version 2.1.2+20211124.ada5c3b36-
> > > > > 150400.2.43-2.1.2+20211124.ada5c3b36) - partition with quorum
> > > > >   * Last updated: Wed Mar 15 08:32:27 2023
> > > > >   * Last change:  Wed Mar 15 06:07:08 2023 by root via
> > > cibadmin on
> > > > > FILE-4
> > > > >   * 4 nodes configured
> > > > >   * 11 resource instances configured (5 DISABLED)
> > > > > Node List:
> > > > >   * Node FILE-1: standby (with active resources)
> > > > >   * Node FILE-3: standby (with active resources)
> > > > >   * Node FILE-4: standby (with active resources)
> > > > >   * Online: [ FILE-2 ]
> > > > 
> > > > pacemaker logs indicate that FILE-2 received the commands to
> > > put it
> > > > into standby. 
> > > > 
> > > > > FILE-2:/var/log # grep standby
> > > /var/log/pacemaker/pacemaker.log
> > > > > Mar 15 06:07:08.098 FILE-2 pacemaker-based     [8635]
> > > > > (cib_perform_op)  info: ++                                   
> > >     
> > > > >    <nvpair id="num-1-instance_attributes-standby"
> > > name="standby"
> > > > > value="on"/>
> > > > > Mar 15 06:07:08.166 FILE-2 pacemaker-based     [8635]
> > > > > (cib_perform_op)  info: ++                                   
> > >     
> > > > >    <nvpair id="num-3-instance_attributes-standby"
> > > name="standby"
> > > > > value="on"/>
> > > > > Mar 15 06:07:08.170 FILE-2 pacemaker-based     [8635]
> > > > > (cib_perform_op)  info: ++                                   
> > >     
> > > > >    <nvpair id="num-2-instance_attributes-standby"
> > > name="standby"
> > > > > value="on"/>
> > > > > Mar 15 06:07:08.230 FILE-2 pacemaker-based     [8635]
> > > > > (cib_perform_op)  info: ++                                   
> > >     
> > > > >    <nvpair id="num-4-instance_attributes-standby"
> > > name="standby"
> > > > > value="on"/>
> > > > 
> > > > 
> > > > Issue is quite intermittent and observed on other nodes as
> > > well. 
> > > > We have seen a similar issue when we try to remove the node
> > > from
> > > > standby mode (using crm node online) command. One/more nodes
> > > fails to
> > > > get removed from standby mode. 
> > > > 
> > > > We suspect it could be an issue with parallel execution of node
> > > > standby/online command for all nodes but this issue wasn't
> > > observed
> > > > with pacemaker packaged with SLES15 SP2 OS. 
> > > > 
> > > > I'm attaching the pacemaker.log from FILE-2 for analysis. Let
> > > us know
> > > > if any additional information is required. 
> > > > 
> > > > OS: SLES15 SP4
> > > > Pacemaker version --> 
> > > >  crmadmin --version
> > > > Pacemaker 2.1.2+20211124.ada5c3b36-150400.2.43
> > > > 
> > > > Thanks,
> > > > Ayush 
> > > > 
> > > > _______________________________________________
> > > > Manage your subscription:
> > > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > > 
> > > > ClusterLabs home: https://www.clusterlabs.org/
> > > _______________________________________________
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > 
> > > ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list