[ClusterLabs] corosync/pacemaker resources start after reboot - incorrect node ID calculated

Tue Sep 28 03:34:28 EDT 2021

Erm,

in my corosync.conf I got also 'name: the-name-of-the-host' and 'nodeid: <someid>' .

I don't see these 2 in your config .

Best Regards,
Strahil Nikolov

В вторник, 28 септември 2021 г., 02:39:20 ч. Гринуич+3, Neitzert, Greg A <greg.neitzert at unisys.com> написа: 

Hello,

We have an issue with a 2 node cluster where both nodes were put into standby (but the resources were not stopped first – so were still in target-role=Started).  When the 2 nodes were rebooted, the corosync and pacemaker service started on the first node that came up, but the resources all tried to start, which should not have happened (standby persists through reboots by default).  
Upon closer inspection, it was found that the system calculated a different node ID than it usually has, and entered the cluster with the same hostname, but not the saved information from the previous cluster ID, so it didn’t remember it was in standby, and tried to come up.  I believe the issue is a consequence of two factors.  First, the network interface ring0 will use was in the state ‘setup-in-progress’ for some reason when the corosync and pacemaker started.  Why exactly that was is still unknown.  The corosync systemctl unit should wait until after network-online.target is reached, but that can mean various things, and doesn’t guarantee a particular interface is up.  In our case, we use a dedicated network interface with a 169.x.x.x address to connect to the other node.  Other interfaces were up, which probably explains why the target was reached.
In normal cases, the nodeid calculated by corosync is something like 704514049, which converts to 169.254.8.1 which is the IP address of the ring0 interface.  

In this particular failing case, that didn’t happen, and it got a nodeid of 2130706433 which converts to 127.0.0.1.  

On start, the following logs of note were logged:

corosync[3965]:   [TOTEM ] The network interface is down.
[TOTEM ] A new membership (127.0.0.1:4) was formed. Members joined: 2130706433
….
crmd[3978]:   notice: Deleting unknown node 704514049/cbsta-mq1 which has conflicting uname with 2130706433

It was the above notice where I believe I lost my saved configuration from the correct node configuration.  Here it indicates it is deleting the node that maps to the 169 address and is replacing it with the node id that maps to 127.0.0.1.

Then all the various resources try to start on this node, which should not have happened (they should have been in standby).

The pengine files verify that they were in standby, but after the new node id was joined, it did not have that setting, and the resources started because the target role was started for the resources before this all happened.

It was shortly after that the interface we use for ring0 came up (eth-ha0):
eth-ha0: link becomes ready

After that the corosync service starts going down:

2021-09-16T00:43:20.022106+01:00 cbsta-mq1 attrd[3976]:   notice: crm_update_peer_proc: Node cbsta-mq1[2130706433] - state is now lost (was member)
2021-09-16T00:43:20.022255+01:00 cbsta-mq1 cib[3973]:   notice: crm_update_peer_proc: Node cbsta-mq1[2130706433] - state is now lost (was member)
2021-09-16T00:43:20.022373+01:00 cbsta-mq1 attrd[3976]:   notice: Removing all cbsta-mq1 attributes for attrd_peer_change_cb
2021-09-16T00:43:20.022524+01:00 cbsta-mq1 cib[3973]:   notice: Removing cbsta-mq1/2130706433 from the membership list
2021-09-16T00:43:20.022639+01:00 cbsta-mq1 attrd[3976]:   notice: Lost attribute writer cbsta-mq1
2021-09-16T00:43:20.022743+01:00 cbsta-mq1 cib[3973]:   notice: Purged 1 peers with id=2130706433 and/or uname=cbsta-mq1 from the membership cache
2021-09-16T00:43:20.022857+01:00 cbsta-mq1 attrd[3976]:   notice: Removing cbsta-mq1/2130706433 from the membership list

The service then restarts, but now it gets the correct node ID (mapping to 169).  

2021-09-16T00:43:20.369715+01:00 cbsta-mq1 corosync[12434]:   [TOTEM ] A new membership (169.254.8.1:12) was formed. Members joined: 704514049
2021-09-16T00:43:20.369830+01:00 cbsta-mq1 corosync[12434]:   [QUORUM] Members[1]: 704514049

It then tries starting resources again, because it has lost previous information apparently from the delete above.  

The root issue appears to be:
    1. The eth-ha0 (ring0 interface) interface was not completely up when corosync started.  I may be able to do something to try to ensure the interface is completely up…
    2. I believe our corosync.conf may need to be tuned (see below).
    3. I believe we may need to adjust our /etc/hosts – as the hostname from uname -n maps back to 127.0.0.1 which I think is not what probably works best with corosync.  

The following is our corosync.conf:

totem {
        version:        2
        cluster_name:   cluster2
        clear_node_high_bit: yes
        crypto_hash:    sha1
        crypto_cipher:  aes256
        rrp_mode: active
        wait_time: 150
#       transport:  udp
        interface {
                ringnumber:     0
                bindnetaddr:    169.254.3.0
                mcastaddr:      239.255.1.2
                mcastport:      5405
        }
        interface {
                ringnumber:     1
                bindnetaddr:    172.31.0.0
                mcastaddr:      239.255.2.2
                mcastport:      5407
        }
}

logging {
        fileline:       on
        to_stderr:      no
        to_logfile:     yes
        logfile:        /var/log/cluster/corosync.log
        to_syslog:      yes
        debug:          on
        timestamp:      on
        logger_subsys {
                subsys: QUORUM
                debug:  on
        }
}

quorum {
        # Enable and configure quorum subsystem (default: off)
        # see also corosync.conf.5 and votequorum.5
        provider: corosync_votequorum
        expected_votes: 1
        two_node: 0
}

Note that we don’t have a nodelist configuration.  It is counting on the bindnetaddr and uses the IP address I believe it finds in that range to determine the node ID.  

I am wondering if we should be adding something like this:

nodelist {
  node {
    ring0_addr: m660b-qproc4-HA
    nodeid: 1
  }
  node {
    ring0_addr: m660b-qproc3-HA
    nodeid: 2
  }
}

Where the hostnames above map to the 169.x.x.x addresses for each node of the cluster.  
I think that will ensure a. the node ID is a stable value (always 1 or 2 – not calculated by corosync) but also maps our ring addresses to the 169 addresses as well?

Finally, am I correct that the hostnames listed in the nodelist above should be set in the /etc/hosts file to point to the 169 addresses for each host, NOT a hostname that resolves to 127.0.0.1?

Any guidance on these issues, and in general how to avoid having the cluster calculate a node ID based on the 127.0.0.1 address which makes it lose its “usual” configuration would be appreciated.  In most cases, the eth-ha0 interface is up by the time corosync starts, but in the cases that it is not (randomly occurs) what I described above happens.

Thank you.

Greg Neitzert | Lead Software Engineer | RTC Software Engineering 2B - Middleware 
Unisys | Ph: 612-486-9662 | Cell: 605-929-9118 | Greg.Neitzert at unisys.com 
Home Based – Sioux Falls, SD USA

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all devices.

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/