<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <pre>>On Thu, Jul 4, 2024 at 5:03 AM Artur Novik <<a

    href="https://lists.clusterlabs.org/mailman/listinfo/users">freishutz at gmail.com</a>> wrote:

>><i> Hi everybody,

</i>>><i> I faced with a strange behavior and I since there was a lot of activity

</i>>><i> around crm_node structs in 2.1.7, I want to believe  that it's a regression

>> rather than a new behavior by default.

</i>>><i>

</i>>><i> "crm_node -i" occasionally, but very often, returns "*exit code 68* : Node

>> is not known to cluster".

</i>>><i>

</i>>><i> The quick test below (taken from two different clusters with pacemaker

</i>>><i> 2.1.7 and 2.1.8):

</i>>><i>

</i>>><i> ```

</i>>><i>

</i>>><i> [<a

    href="https://lists.clusterlabs.org/mailman/listinfo/users">root at node1</a> ~]# crm_node -i

</i>>><i> Node is not known to cluster

</i>>><i> [<a

    href="https://lists.clusterlabs.org/mailman/listinfo/users">root at node1</a> ~]# crm_node -i

</i>>><i> 1

</i>><i>> [<a

    href="https://lists.clusterlabs.org/mailman/listinfo/users">root at node1</a> ~]# crm_node -i

</i>><i>> 1

</i>><i>> [<a

    href="https://lists.clusterlabs.org/mailman/listinfo/users">root at node1</a> ~]# crm_node -i

</i>><i>> Node is not known to cluster

</i>><i>> [<a

    href="https://lists.clusterlabs.org/mailman/listinfo/users">root at node1</a> ~]# for i in 1 2 3 4 5 6 7; do ssh node$i crm_node -i; done

</i>><i>> 1

</i>><i>> 2

</i>><i>> Node is not known to cluster

</i>><i>> Node is not known to cluster

</i>><i>> 5

</i>><i>> Node is not known to cluster

</i>><i>> 7

</i>><i>> [<a

    href="https://lists.clusterlabs.org/mailman/listinfo/users">root at node1</a> ~]# for i in 1 2 3 4 5 6 7; do sleep 1; ssh node$i crm_node -i ; done

</i>><i>> Node is not known to cluster

</i>><i>> Node is not known to cluster

</i>><i>> Node is not known to cluster

</i>><i>> Node is not known to cluster

</i>><i>> Node is not known to cluster

</i>><i>> 6

</i>><i>> 7

</i>>><i>

</i>>><i>

</i>><i>> [<a

    href="https://lists.clusterlabs.org/mailman/listinfo/users">root at es-brick2</a> ~]# crm_node -i

</i>><i>> 2

</i>><i>> [<a

    href="https://lists.clusterlabs.org/mailman/listinfo/users">root at es-brick2</a> ~]# crm_node -i

</i>><i>> 2

</i>><i>> [<a

    href="https://lists.clusterlabs.org/mailman/listinfo/users">root at es-brick2</a> ~]# crm_node -i

</i>><i>> Node is not known to cluster

</i>>><i> [<a

    href="https://lists.clusterlabs.org/mailman/listinfo/users">root at es-brick2</a> ~]# crm_node -i

</i>>><i> 2

</i>>><i> [<a

    href="https://lists.clusterlabs.org/mailman/listinfo/users">root at es-brick2</a> ~]# rpm -qa | grep pacemaker | sort

</i>>><i> pacemaker-2.1.8.rc2-1.el8_10.x86_64

</i>>><i> pacemaker-cli-2.1.8.rc2-1.el8_10.x86_64

</i>>><i> pacemaker-cluster-libs-2.1.8.rc2-1.el8_10.x86_64

</i>>><i> pacemaker-libs-2.1.8.rc2-1.el8_10.x86_64

</i>>><i> pacemaker-remote-2.1.8.rc2-1.el8_10.x86_64

</i>>><i> pacemaker-schemas-2.1.8.rc2-1.el8_10.noarch

</i>>><i>

</i>>><i> ```

</i>>><i>

</i>>><i> I checked next versions (all packages, except the last one, taken from

</i>>><i> rocky linux and rebuilt against corosync 3.1.8 from rocky 8.10. The distro

</i>>><i> itself rockylinux 8.10 too):

</i>>><i> Pacemaker  version Status

</i>>><i> 2.1.5 (8.8) OK

</i>>><i> 2.1.6 (8.9) OK

</i>>><i> 2.1.7 (8.10) Broken

</i>>><i> 2.1.8-RC2 (upstream) Broken

</i>>><i>

</i>>><i> I don't attach logs for now since I believe it could be reproduced

>> absolutely on any installation.

</i>>><i>

</i>

> Hi, thanks for the report. I can try to reproduce on 2.1.8 later, but so

> far I'm unable to reproduce on the current upstream main branch. I don't

> believe there are any major differences in the relevant code between main

> and 2.1.8-rc2.

> I wonder if it's an issue where the controller is busy with a synchronous

> request when you run `crm_node -i` (which would be a bug). Can you share

> logs and your config?

The logs could be taken from google drive since they are too large to attach:

<a class="moz-txt-link-freetext" href="https://drive.google.com/file/d/1MLgjYncHXrQlZQ2FAmoGp9blvDtS-8RG/view?usp=drive_link">https://drive.google.com/file/d/1MLgjYncHXrQlZQ2FAmoGp9blvDtS-8RG/view?usp=drive_link</a>  (~65MB with all nodes)

<a class="moz-txt-link-freetext" href="https://drive.google.com/drive/folders/13YYhAtS6zlDjoOOf8ZZQSyfTP_wzLbG_?usp=drive_link">https://drive.google.com/drive/folders/13YYhAtS6zlDjoOOf8ZZQSyfTP_wzLbG_?usp=drive_link</a> (the directory with logs)

The timestamp and node:

<span style="font-family:monospace"><span

    style="color:#000000;background-color:#ffffff;">[root@es-brick1 ~]# date</span>

Fri Jul  5 10:02:35 UTC 2024

Since this reproduced on multiple KVMs (rhel8, 9 and fedora40), I attached some info from hypervisor side too.

</span>

><i> Thanks,

</i>><i> A

</i>><i> _______________________________________________

</i>><i> Manage your subscription:

</i>><i> <a

    href="https://lists.clusterlabs.org/mailman/listinfo/users"

    class="moz-txt-link-freetext">https://lists.clusterlabs.org/mailman/listinfo/users</a>

</i>><i>

</i>><i> ClusterLabs home: <a href="https://www.clusterlabs.org/"

    class="moz-txt-link-freetext">https://www.clusterlabs.org/</a>

</i>><i>

</i>

> -- 

> Regards,

>

> Reid Wahl (He/Him)

> Senior Software Engineer, Red Hat

> RHEL High Availability - Pacemaker</pre>

    <p></p>

  </body>

</html>