[ClusterLabs] crm node stays online after issuing node standby command

Ayush Siddarath aayush23595 at gmail.com
Wed Mar 15 11:31:17 EDT 2023


Hi Ken,

Somehow I didn't receive the email for your response.

The system is currently in the same state and here are the required command
outputs:

FILE-2:~ # cibadmin -Q | grep standby
          <nvpair id="num-1-instance_attributes-standby" name="standby"
value="on"/>
          <nvpair id="num-3-instance_attributes-standby" name="standby"
value="on"/>
          <nvpair id="num-4-instance_attributes-standby" name="standby"
value="on"/>


Running into some syntax issues when issuing the attrd_updater command.
Could you review the commands?

FILE-2:~ # attrd_updater -Q --name="standby" -N FILE-3
Could not query value of standby: attribute does not exist


FILE-2:~ # attrd_updater -Q -n standby -N FILE-3
Could not query value of standby: attribute does not exist
FILE-2:~ # attrd_updater -Q -n standby -N FILE-2
Could not query value of standby: attribute does not exist


cibadmin -Q -->

    </crm_config>
    <nodes>
      <node id="1" uname="FILE-1">
        <instance_attributes id="num-1-instance_attributes">
          <nvpair id="num-1-instance_attributes-standby" name="standby"
value="on"/>
        </instance_attributes>
      </node>
      <node id="2" uname="FILE-2"/>
      <node id="3" uname="FILE-3">
        <instance_attributes id="num-3-instance_attributes">
          <nvpair id="num-3-instance_attributes-standby" name="standby"
value="on"/>
        </instance_attributes>
      </node>
      <node id="4" uname="FILE-4">
        <instance_attributes id="num-4-instance_attributes">
          <nvpair id="num-4-instance_attributes-standby" name="standby"
value="on"/>
        </instance_attributes>
      </node>

After a few minutes, re-running the node standby command for the same node
works fine.

Thanks,
Ayush

On Wed, Mar 15, 2023 at 8:55 PM Priyanka Balotra <
priyanka.14balotra at gmail.com> wrote:

> +Ayush
>
> Thanks
>
>
> On Wed, 15 Mar 2023 at 8:17 PM, Ken Gaillot <kgaillot at redhat.com> wrote:
>
>> Hi,
>>
>> If you can reproduce the problem, the following info would be helpful:
>>
>> * "cibadmin -Q | grep standby" : to show whether it was successfully
>> recorded in the CIB (will show info for any node with standby, but the
>> XML ID likely has the node name or ID in it)
>>
>> * "attrd_updater -Q -n standby -N FILE-2" : to show whether the
>> attribute manager has the right value in memory for the affected node
>>
>>
>> On Wed, 2023-03-15 at 15:51 +0530, Ayush Siddarath wrote:
>> > Hi All,
>> >
>> > We are seeing an issue as part of crm maintenance operations. As part
>> > of the upgrade process, the crm nodes are put into standby mode.
>> > But it's observed that one of the nodes fails to go into standby mode
>> > despite the "crm node standby" returning success.
>> >
>> > Commands issued to put nodes into maintenance :
>> >
>> > > [2023-03-15 06:07:08 +0000] [468] [INFO] changed: [FILE-1] =>
>> > > {"changed": true, "cmd": "/usr/sbin/crm node standby FILE-1",
>> > > "delta": "0:00:00.442615", "end": "2023-03-15 06:07:08.150375",
>> > > "rc": 0, "start": "2023-03-15 06:07:07.707760", "stderr": "",
>> > > "stderr_lines": [], "stdout": "\u001b[32mINFO\u001b[0m: standby
>> > > node FILE-1", "stdout_lines": ["\u001b[32mINFO\u001b[0m: standby
>> > > node FILE-1"]}
>> > > .
>> > > [2023-03-15 06:07:08 +0000] [468] [INFO] changed: [FILE-2] =>
>> > > {"changed": true, "cmd": "/usr/sbin/crm node standby FILE-2",
>> > > "delta": "0:00:00.459407", "end": "2023-03-15 06:07:08.223749",
>> > > "rc": 0, "start": "2023-03-15 06:07:07.764342", "stderr": "",
>> > > "stderr_lines": [], "stdout": "\u001b[32mINFO\u001b[0m: standby
>> > > node FILE-2", "stdout_lines": ["\u001b[32mINFO\u001b[0m: standby
>> > > node FILE-2"]}
>> >
>> >       ........
>> >
>> > Crm status o/p after above command execution:
>> >
>> > > FILE-2:/var/log # crm status
>> > > Cluster Summary:
>> > >   * Stack: corosync
>> > >   * Current DC: FILE-1 (version 2.1.2+20211124.ada5c3b36-
>> > > 150400.2.43-2.1.2+20211124.ada5c3b36) - partition with quorum
>> > >   * Last updated: Wed Mar 15 08:32:27 2023
>> > >   * Last change:  Wed Mar 15 06:07:08 2023 by root via cibadmin on
>> > > FILE-4
>> > >   * 4 nodes configured
>> > >   * 11 resource instances configured (5 DISABLED)
>> > > Node List:
>> > >   * Node FILE-1: standby (with active resources)
>> > >   * Node FILE-3: standby (with active resources)
>> > >   * Node FILE-4: standby (with active resources)
>> > >   * Online: [ FILE-2 ]
>> >
>> > pacemaker logs indicate that FILE-2 received the commands to put it
>> > into standby.
>> >
>> > > FILE-2:/var/log # grep standby /var/log/pacemaker/pacemaker.log
>> > > Mar 15 06:07:08.098 FILE-2 pacemaker-based     [8635]
>> > > (cib_perform_op)  info: ++
>> > >    <nvpair id="num-1-instance_attributes-standby" name="standby"
>> > > value="on"/>
>> > > Mar 15 06:07:08.166 FILE-2 pacemaker-based     [8635]
>> > > (cib_perform_op)  info: ++
>> > >    <nvpair id="num-3-instance_attributes-standby" name="standby"
>> > > value="on"/>
>> > > Mar 15 06:07:08.170 FILE-2 pacemaker-based     [8635]
>> > > (cib_perform_op)  info: ++
>> > >    <nvpair id="num-2-instance_attributes-standby" name="standby"
>> > > value="on"/>
>> > > Mar 15 06:07:08.230 FILE-2 pacemaker-based     [8635]
>> > > (cib_perform_op)  info: ++
>> > >    <nvpair id="num-4-instance_attributes-standby" name="standby"
>> > > value="on"/>
>> >
>> >
>> > Issue is quite intermittent and observed on other nodes as well.
>> > We have seen a similar issue when we try to remove the node from
>> > standby mode (using crm node online) command. One/more nodes fails to
>> > get removed from standby mode.
>> >
>> > We suspect it could be an issue with parallel execution of node
>> > standby/online command for all nodes but this issue wasn't observed
>> > with pacemaker packaged with SLES15 SP2 OS.
>> >
>> > I'm attaching the pacemaker.log from FILE-2 for analysis. Let us know
>> > if any additional information is required.
>> >
>> > OS: SLES15 SP4
>> > Pacemaker version -->
>> >  crmadmin --version
>> > Pacemaker 2.1.2+20211124.ada5c3b36-150400.2.43
>> >
>> > Thanks,
>> > Ayush
>> >
>> > _______________________________________________
>> > Manage your subscription:
>> > https://lists.clusterlabs.org/mailman/listinfo/users
>> >
>> > ClusterLabs home: https://www.clusterlabs.org/
>> --
>> Ken Gaillot <kgaillot at redhat.com>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20230315/027998b9/attachment-0001.htm>


More information about the Users mailing list