<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<pre>>On Thu, Jul 4, 2024 at 5:03 AM Artur Novik <<a
href="https://lists.clusterlabs.org/mailman/listinfo/users">freishutz at gmail.com</a>> wrote:
>><i> Hi everybody,
</i>>><i> I faced with a strange behavior and I since there was a lot of activity
</i>>><i> around crm_node structs in 2.1.7, I want to believe that it's a regression
</i>>><i> rather than a new behavior by default.
</i>>><i>
</i>>><i> "crm_node -i" occasionally, but very often, returns "*exit code 68* : Node
</i>>><i> is not known to cluster".
</i>>><i>
</i>>><i> The quick test below (taken from two different clusters with pacemaker
</i>>><i> 2.1.7 and 2.1.8):
</i>>><i>
</i>>><i> ```
</i>>><i>
</i>>><i> [<a
href="https://lists.clusterlabs.org/mailman/listinfo/users">root at node1</a> ~]# crm_node -i
</i>>><i> Node is not known to cluster
</i>>><i> [<a
href="https://lists.clusterlabs.org/mailman/listinfo/users">root at node1</a> ~]# crm_node -i
</i>>><i> 1
</i>><i>> [<a
href="https://lists.clusterlabs.org/mailman/listinfo/users">root at node1</a> ~]# crm_node -i
</i>><i>> 1
</i>><i>> [<a
href="https://lists.clusterlabs.org/mailman/listinfo/users">root at node1</a> ~]# crm_node -i
</i>><i>> Node is not known to cluster
</i>><i>> [<a
href="https://lists.clusterlabs.org/mailman/listinfo/users">root at node1</a> ~]# for i in 1 2 3 4 5 6 7; do ssh node$i crm_node -i; done
</i>><i>> 1
</i>><i>> 2
</i>><i>> Node is not known to cluster
</i>><i>> Node is not known to cluster
</i>><i>> 5
</i>><i>> Node is not known to cluster
</i>><i>> 7
</i>><i>> [<a
href="https://lists.clusterlabs.org/mailman/listinfo/users">root at node1</a> ~]# for i in 1 2 3 4 5 6 7; do sleep 1; ssh node$i crm_node -i ; done
</i>><i>> Node is not known to cluster
</i>><i>> Node is not known to cluster
</i>><i>> Node is not known to cluster
</i>><i>> Node is not known to cluster
</i>><i>> Node is not known to cluster
</i>><i>> 6
</i>><i>> 7
</i>>><i>
</i>>><i>
</i>><i>> [<a
href="https://lists.clusterlabs.org/mailman/listinfo/users">root at es-brick2</a> ~]# crm_node -i
</i>><i>> 2
</i>><i>> [<a
href="https://lists.clusterlabs.org/mailman/listinfo/users">root at es-brick2</a> ~]# crm_node -i
</i>><i>> 2
</i>><i>> [<a
href="https://lists.clusterlabs.org/mailman/listinfo/users">root at es-brick2</a> ~]# crm_node -i
</i>><i>> Node is not known to cluster
</i>>><i> [<a
href="https://lists.clusterlabs.org/mailman/listinfo/users">root at es-brick2</a> ~]# crm_node -i
</i>>><i> 2
</i>>><i> [<a
href="https://lists.clusterlabs.org/mailman/listinfo/users">root at es-brick2</a> ~]# rpm -qa | grep pacemaker | sort
</i>>><i> pacemaker-2.1.8.rc2-1.el8_10.x86_64
</i>>><i> pacemaker-cli-2.1.8.rc2-1.el8_10.x86_64
</i>>><i> pacemaker-cluster-libs-2.1.8.rc2-1.el8_10.x86_64
</i>>><i> pacemaker-libs-2.1.8.rc2-1.el8_10.x86_64
</i>>><i> pacemaker-remote-2.1.8.rc2-1.el8_10.x86_64
</i>>><i> pacemaker-schemas-2.1.8.rc2-1.el8_10.noarch
</i>>><i>
</i>>><i> ```
</i>>><i>
</i>>><i> I checked next versions (all packages, except the last one, taken from
</i>>><i> rocky linux and rebuilt against corosync 3.1.8 from rocky 8.10. The distro
</i>>><i> itself rockylinux 8.10 too):
</i>>><i> Pacemaker version Status
</i>>><i> 2.1.5 (8.8) OK
</i>>><i> 2.1.6 (8.9) OK
</i>>><i> 2.1.7 (8.10) Broken
</i>>><i> 2.1.8-RC2 (upstream) Broken
</i>>><i>
</i>>><i> I don't attach logs for now since I believe it could be reproduced
</i>>><i> absolutely on any installation.
</i>>><i>
</i>
> Hi, thanks for the report. I can try to reproduce on 2.1.8 later, but so
> far I'm unable to reproduce on the current upstream main branch. I don't
> believe there are any major differences in the relevant code between main
> and 2.1.8-rc2.
> I wonder if it's an issue where the controller is busy with a synchronous
> request when you run `crm_node -i` (which would be a bug). Can you share
> logs and your config?
The logs could be taken from google drive since they are too large to attach:
<a class="moz-txt-link-freetext" href="https://drive.google.com/file/d/1MLgjYncHXrQlZQ2FAmoGp9blvDtS-8RG/view?usp=drive_link">https://drive.google.com/file/d/1MLgjYncHXrQlZQ2FAmoGp9blvDtS-8RG/view?usp=drive_link</a> (~65MB with all nodes)
<a class="moz-txt-link-freetext" href="https://drive.google.com/drive/folders/13YYhAtS6zlDjoOOf8ZZQSyfTP_wzLbG_?usp=drive_link">https://drive.google.com/drive/folders/13YYhAtS6zlDjoOOf8ZZQSyfTP_wzLbG_?usp=drive_link</a> (the directory with logs)
The timestamp and node:
<span style="font-family:monospace"><span
style="color:#000000;background-color:#ffffff;">[root@es-brick1 ~]# date</span>
Fri Jul 5 10:02:35 UTC 2024
Since this reproduced on multiple KVMs (rhel8, 9 and fedora40), I attached some info from hypervisor side too.
</span>
><i> Thanks,
</i>><i> A
</i>><i> _______________________________________________
</i>><i> Manage your subscription:
</i>><i> <a
href="https://lists.clusterlabs.org/mailman/listinfo/users"
class="moz-txt-link-freetext">https://lists.clusterlabs.org/mailman/listinfo/users</a>
</i>><i>
</i>><i> ClusterLabs home: <a href="https://www.clusterlabs.org/"
class="moz-txt-link-freetext">https://www.clusterlabs.org/</a>
</i>><i>
</i>
> --
> Regards,
>
> Reid Wahl (He/Him)
> Senior Software Engineer, Red Hat
> RHEL High Availability - Pacemaker</pre>
<p></p>
</body>
</html>