[ClusterLabs] corosync/pacemaker resources start after reboot - incorrect node ID calculated

Mon Sep 27 19:39:05 EDT 2021

Hello,

We have an issue with a 2 node cluster where both nodes were put into
standby (but the resources were not stopped first - so were still in
target-role=Started).  When the 2 nodes were rebooted, the corosync and
pacemaker service started on the first node that came up, but the resources
all tried to start, which should not have happened (standby persists through
reboots by default).  

Upon closer inspection, it was found that the system calculated a different
node ID than it usually has, and entered the cluster with the same hostname,
but not the saved information from the previous cluster ID, so it didn't
remember it was in standby, and tried to come up.  I believe the issue is a
consequence of two factors.  First, the network interface ring0 will use was
in the state 'setup-in-progress' for some reason when the corosync and
pacemaker started.  Why exactly that was is still unknown.  The corosync
systemctl unit should wait until after network-online.target is reached, but
that can mean various things, and doesn't guarantee a particular interface
is up.  In our case, we use a dedicated network interface with a 169.x.x.x
address to connect to the other node.  Other interfaces were up, which
probably explains why the target was reached.

In normal cases, the nodeid calculated by corosync is something like
704514049, which converts to 169.254.8.1 which is the IP address of the
ring0 interface.  

In this particular failing case, that didn't happen, and it got a nodeid of
2130706433 which converts to 127.0.0.1.  

On start, the following logs of note were logged:

corosync[3965]:   [TOTEM ] The network interface is down.

[TOTEM ] A new membership (127.0.0.1:4) was formed. Members joined:
2130706433

..

crmd[3978]:   notice: Deleting unknown node 704514049/cbsta-mq1 which has
conflicting uname with 2130706433

It was the above notice where I believe I lost my saved configuration from
the correct node configuration.  Here it indicates it is deleting the node
that maps to the 169 address and is replacing it with the node id that maps
to 127.0.0.1.

Then all the various resources try to start on this node, which should not
have happened (they should have been in standby).

The pengine files verify that they were in standby, but after the new node
id was joined, it did not have that setting, and the resources started
because the target role was started for the resources before this all
happened.

It was shortly after that the interface we use for ring0 came up (eth-ha0):

eth-ha0: link becomes ready

After that the corosync service starts going down:

2021-09-16T00:43:20.022106+01:00 cbsta-mq1 attrd[3976]:   notice:
crm_update_peer_proc: Node cbsta-mq1[2130706433] - state is now lost (was
member)

2021-09-16T00:43:20.022255+01:00 cbsta-mq1 cib[3973]:   notice:
crm_update_peer_proc: Node cbsta-mq1[2130706433] - state is now lost (was
member)

2021-09-16T00:43:20.022373+01:00 cbsta-mq1 attrd[3976]:   notice: Removing
all cbsta-mq1 attributes for attrd_peer_change_cb

2021-09-16T00:43:20.022524+01:00 cbsta-mq1 cib[3973]:   notice: Removing
cbsta-mq1/2130706433 from the membership list

2021-09-16T00:43:20.022639+01:00 cbsta-mq1 attrd[3976]:   notice: Lost
attribute writer cbsta-mq1

2021-09-16T00:43:20.022743+01:00 cbsta-mq1 cib[3973]:   notice: Purged 1
peers with id=2130706433 and/or uname=cbsta-mq1 from the membership cache

2021-09-16T00:43:20.022857+01:00 cbsta-mq1 attrd[3976]:   notice: Removing
cbsta-mq1/2130706433 from the membership list

The service then restarts, but now it gets the correct node ID (mapping to
169).  

2021-09-16T00:43:20.369715+01:00 cbsta-mq1 corosync[12434]:   [TOTEM ] A new
membership (169.254.8.1:12) was formed. Members joined: 704514049

2021-09-16T00:43:20.369830+01:00 cbsta-mq1 corosync[12434]:   [QUORUM]
Members[1]: 704514049

It then tries starting resources again, because it has lost previous
information apparently from the delete above.  

The root issue appears to be:

1.	The eth-ha0 (ring0 interface) interface was not completely up when
corosync started.  I may be able to do something to try to ensure the
interface is completely up.
2.	I believe our corosync.conf may need to be tuned (see below).
3.	I believe we may need to adjust our /etc/hosts - as the hostname
from uname -n maps back to 127.0.0.1 which I think is not what probably
works best with corosync.  

The following is our corosync.conf:

totem {

        version:        2

        cluster_name:   cluster2

        clear_node_high_bit: yes

        crypto_hash:    sha1

        crypto_cipher:  aes256

        rrp_mode: active

        wait_time: 150

#       transport:  udp

        interface {

                ringnumber:     0

                bindnetaddr:    169.254.3.0

                mcastaddr:      239.255.1.2

                mcastport:      5405

        }

        interface {

                ringnumber:     1

                bindnetaddr:    172.31.0.0

                mcastaddr:      239.255.2.2

                mcastport:      5407

        }

}

logging {

        fileline:       on

        to_stderr:      no

        to_logfile:     yes

        logfile:        /var/log/cluster/corosync.log

        to_syslog:      yes

        debug:          on

        timestamp:      on

        logger_subsys {

                subsys: QUORUM

                debug:  on

        }

}

quorum {

        # Enable and configure quorum subsystem (default: off)

        # see also corosync.conf.5 and votequorum.5

        provider: corosync_votequorum

        expected_votes: 1

        two_node: 0

}

Note that we don't have a nodelist configuration.  It is counting on the
bindnetaddr and uses the IP address I believe it finds in that range to
determine the node ID.  

I am wondering if we should be adding something like this:

nodelist {

  node {

    ring0_addr: m660b-qproc4-HA

    nodeid: 1

  }

  node {

    ring0_addr: m660b-qproc3-HA

    nodeid: 2

  }

}

Where the hostnames above map to the 169.x.x.x addresses for each node of
the cluster.  

I think that will ensure a. the node ID is a stable value (always 1 or 2 -
not calculated by corosync) but also maps our ring addresses to the 169
addresses as well?

Finally, am I correct that the hostnames listed in the nodelist above should
be set in the /etc/hosts file to point to the 169 addresses for each host,
NOT a hostname that resolves to 127.0.0.1?

Any guidance on these issues, and in general how to avoid having the cluster
calculate a node ID based on the 127.0.0.1 address which makes it lose its
"usual" configuration would be appreciated.  In most cases, the eth-ha0
interface is up by the time corosync starts, but in the cases that it is not
(randomly occurs) what I described above happens.

Thank you.

Greg Neitzert | Lead Software Engineer | RTC Software Engineering 2B -
Middleware 

Unisys | Ph: 612-486-9662 | Cell: 605-929-9118 |
<mailto:Greg.Neitzert at unisys.com> Greg.Neitzert at unisys.com 

Home Based - Sioux Falls, SD USA

 <http://www.unisys.com/> 

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is for use only by the intended recipient. If you received this
in error, please contact the sender and delete the e-mail and its
attachments from all devices.

 <http://www.linkedin.com/company/unisys>    <http://twitter.com/unisyscorp>
<http://www.youtube.com/theunisyschannel>
<http://www.facebook.com/unisyscorp>  <https://vimeo.com/unisys>
<http://blogs.unisys.com/> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210927/d4078300/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4550 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210927/d4078300/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 6772 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210927/d4078300/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.jpg
Type: image/jpeg
Size: 778 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210927/d4078300/attachment-0006.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.jpg
Type: image/jpeg
Size: 776 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210927/d4078300/attachment-0007.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.jpg
Type: image/jpeg
Size: 775 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210927/d4078300/attachment-0008.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.jpg
Type: image/jpeg
Size: 755 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210927/d4078300/attachment-0009.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image007.jpg
Type: image/jpeg
Size: 1737 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210927/d4078300/attachment-0010.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image008.jpg
Type: image/jpeg
Size: 791 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210927/d4078300/attachment-0011.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 7858 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210927/d4078300/attachment-0001.p7s>