[ClusterLabs] regression in crm_node in 2.1.7/uppstream?

Thu Jul 4 19:37:59 UTC 2024

On Thu, Jul 4, 2024 at 5:03 AM Artur Novik <freishutz at gmail.com> wrote:

> Hi everybody,
> I faced with a strange behavior and I since there was a lot of activity
> around crm_node structs in 2.1.7, I want to believe  that it's a regression
> rather than a new behavior by default.
>
> "crm_node -i" occasionally, but very often, returns "*exit code 68* : Node
> is not known to cluster".
>
> The quick test below (taken from two different clusters with pacemaker
> 2.1.7 and 2.1.8):
>
> ```
>
> [root at node1 ~]# crm_node -i
> Node is not known to cluster
> [root at node1 ~]# crm_node -i
> 1
> [root at node1 ~]# crm_node -i
> 1
> [root at node1 ~]# crm_node -i
> Node is not known to cluster
> [root at node1 ~]# for i in 1 2 3 4 5 6 7; do ssh node$i crm_node -i; done
> 1
> 2
> Node is not known to cluster
> Node is not known to cluster
> 5
> Node is not known to cluster
> 7
> [root at node1 ~]# for i in 1 2 3 4 5 6 7; do sleep 1; ssh node$i crm_node -i ; done
> Node is not known to cluster
> Node is not known to cluster
> Node is not known to cluster
> Node is not known to cluster
> Node is not known to cluster
> 6
> 7
>
>
> [root at es-brick2 ~]# crm_node -i
> 2
> [root at es-brick2 ~]# crm_node -i
> 2
> [root at es-brick2 ~]# crm_node -i
> Node is not known to cluster
> [root at es-brick2 ~]# crm_node -i
> 2
> [root at es-brick2 ~]# rpm -qa | grep pacemaker | sort
> pacemaker-2.1.8.rc2-1.el8_10.x86_64
> pacemaker-cli-2.1.8.rc2-1.el8_10.x86_64
> pacemaker-cluster-libs-2.1.8.rc2-1.el8_10.x86_64
> pacemaker-libs-2.1.8.rc2-1.el8_10.x86_64
> pacemaker-remote-2.1.8.rc2-1.el8_10.x86_64
> pacemaker-schemas-2.1.8.rc2-1.el8_10.noarch
>
> ```
>
> I checked next versions (all packages, except the last one, taken from
> rocky linux and rebuilt against corosync 3.1.8 from rocky 8.10. The distro
> itself rockylinux 8.10 too):
> Pacemaker  version Status
> 2.1.5 (8.8) OK
> 2.1.6 (8.9) OK
> 2.1.7 (8.10) Broken
> 2.1.8-RC2 (upstream) Broken
>
> I don't attach logs for now since I believe it could be reproduced
> absolutely on any installation.
>

Hi, thanks for the report. I can try to reproduce on 2.1.8 later, but so
far I'm unable to reproduce on the current upstream main branch. I don't
believe there are any major differences in the relevant code between main
and 2.1.8-rc2.

I wonder if it's an issue where the controller is busy with a synchronous
request when you run `crm_node -i` (which would be a bug). Can you share
logs and your config?

> Thanks,
> A
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>

-- 
Regards,

Reid Wahl (He/Him)
Senior Software Engineer, Red Hat
RHEL High Availability - Pacemaker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20240704/0c939cb0/attachment.htm>