[ClusterLabs] 9 nodes pacemaker cluster setup non-DC nodes reboot parallelly

S Sathish S s.s.sathish at ericsson.com
Tue Jul 16 16:05:36 UTC 2024


Hi Ken,

Thank you for quick response.

We have checked pacemaker logs found signal 15 on pacemaker component . Post that we have executed pcs cluster start then pacemaker and corosync service started properly and joined cluster also.

With respect to reboot query , In our application pacemaker cluster no quorum or fencing is configured. Please find reboot procedure followed in our upgrade procedure which will be executed parallelly on all 9 nodes cluster. Whether it is recommended way to reboot?


  1.   pacemaker cluster in maintenance mode.
  2.  Bring down pacemaker cluster service using below command.
# pcs cluster stop
# pcs cluster disable

     3) reboot
     4) Bring up pacemaker cluster Service


Regards,
S Sathish S
________________________________
From: Ken Gaillot <kgaillot at redhat.com>
Sent: Tuesday, July 16, 2024 7:53 PM
To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Cc: S Sathish S <s.s.sathish at ericsson.com>
Subject: Re: [ClusterLabs] 9 nodes pacemaker cluster setup non-DC nodes reboot parallelly

On Tue, 2024-07-16 at 11:18 +0000, S Sathish S via Users wrote:
> Hi Team,
>
> In our product we have 9 nodes pacemaker cluster setup non-DC nodes
> reboot parallelly. Most of nodes join cluster properly and only one
> node pacemaker and corosync service is not came up properly with
> below error message.
>
> Error Message:
> Error: error running crm_mon, is pacemaker running?
>   crm_mon: Connection to cluster failed: Connection refused

All that indicates is that Pacemaker is not responding. You'd have to
look at the system log and/or pacemaker.log from that time to find out
more.

>
> Query : Is it recommended to reboot parallelly of non-DC nodes ?

As long as they are cleanly rebooted, there should be no fencing or
other actual problems. However the cluster will lose quorum and have to
stop all resources. If you reboot less than half of the nodes at one
time and wait for them to rejoin before rebooting more, you would avoid
that.

>
> Thanks and Regards,
> S Sathish S
> _______________________________________________
> Manage your subscription:
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.clusterlabs.org%2Fmailman%2Flistinfo%2Fusers&data=05%7C02%7Cs.s.sathish%40ericsson.com%7C5e391698a47643d1c7fb08dca5a2ec0e%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638567366368643199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=QIk47YY2QLsIBwA1lWM%2BeG%2FEFfEL%2B5D7GEn0nOTeRV8%3D&reserved=0<https://lists.clusterlabs.org/mailman/listinfo/users>
>
> ClusterLabs home: https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.clusterlabs.org%2F&data=05%7C02%7Cs.s.sathish%40ericsson.com%7C5e391698a47643d1c7fb08dca5a2ec0e%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638567366368652616%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=WJe0xE95VNwHECBIB8onLtn537l9p6teIrHQGQwU24U%3D&reserved=0<https://www.clusterlabs.org/>
--
Ken Gaillot <kgaillot at redhat.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20240716/cdc87a84/attachment.htm>


More information about the Users mailing list