[ClusterLabs] Corosync 3.1.5 Fails to Autostart

Jan Friesse jfriesse at redhat.com
Tue Apr 25 03:23:12 EDT 2023


On 24/04/2023 22:16, Tyler Phillippe via Users wrote:
> Hello all,
> 
> We are currently using RHEL9 and have set up a PCS cluster. When restarting the servers, we noticed Corosync 3.1.5 doesn't start properly with the below error message:
> 
> Parse error in config: No valid name found for local host
> Corosync Cluster Engine exiting with status 8 at main.c:1445.
> Corosync.service: Main process exited, code=exited, status=8/n/a
> 
> These are physical, blade machines that are using a 2x Fibre Channel NIC in a Mode 6 bond as their networking interface for the cluster; other than that, there is really nothing special about these machines. We have ensured the names of the machines exist in /etc/hosts and that they can resolve those names via the hosts file first. The strange 

This is really weird. All described symptoms simply points to name 
service (DNS/NIS/...) is not available during bootup and it will become 
available later. But if /etc/hosts really contains static entries it 
should just work.

Could you please try to set debug: trace in corosync.conf like
```
...
logging {
     to_syslog: yes
     to_stderr: yes
     timestamp: on
     to_logfile: yes
     logfile: /var/log/cluster/corosync.log

     debug: trace
}
...
```

and observe very beginning output of corosync (either in syslog or in 
/var/log/cluster/corosync.log)? There should be something like

totemip_parse: IPv4 address of NAME resolved as IPADDR

Also compare the difference between corosync started on boot and later 
after multi-user.target.

thing is if we start Corosync manually after we can SSH into the 
machines, Corosync starts immediately and without issue. We did manage 
to get Corosync to autostart properly by modifying the service file and 
changing the After=network-online.target to After=multi-user.target. In 
doing this, at first, Pacemaker complains about mismatching dependencies 
in the service between Corosync and Pacemaker. Changing the Pacemaker 
service to After=multi-user.target fixes that self-caused issue. Any 
ideas on this one? Mostly checking to see if changing the After 
dependency will harm us in the future.

That's questionable. It's always best if resolve uses /etc/hosts 
reliably, what is not the case now, so IMHO better to find a reason why 
/etc/hosts doesn't work rather than "workaround" it.

Regards,
   Honza

> 
> Thanks!
> 
> Respectfully,
>   Tyler Phillippe
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 



More information about the Users mailing list