[ClusterLabs] Cluster Stopped, No Messages?

Mon May 31 01:53:47 EDT 2021

On 5/29/21 12:21 AM, Strahil Nikolov wrote:
> I agree -> fencing is mandatory.
Agreed that with proper fencing setup the cluster
wouldn'thave run into that state.
But still it might be interesting to find out what has
happened. Not seeing anything in the log snippet either.
Assuming you are running something systemd-based.
Did you check the journal for pacemaker to see what
systemd is thinking?
With the standard unit-file systemd should observe
pacemakerd and restart it if it goes away ungracefully.
You should be able to test this behavior sending a
SIGKILL to pacemakerd.
pacemakerd in turn watches out for signals from the
sub-daemons it has spawned (I'm currently working
on more in-depth observation here.).
So just disappearing shouldn't happen that easily.
Did you find any core-dumps?

Regards,
Klaus
>
> You can enable the debug logs by editing corosync.conf or 
> /etc/sysconfig/pacemaker.
>
> In case simple reload doesn't work, you can set the cluster in global 
> maintenance, stop and then start the stack.
>
>
> Best Regards,
> Strahil Nikolov
>
>     On Fri, May 28, 2021 at 22:13, Digimer
>     <lists at alteeve.ca> wrote:
>     On 2021-05-28 3:08 p.m., Eric Robinson wrote:
>     >
>     >> -----Original Message-----
>     >> From: Digimer <lists at alteeve.ca <mailto:lists at alteeve.ca>>
>     >> Sent: Friday, May 28, 2021 12:43 PM
>     >> To: Cluster Labs - All topics related to open-source clustering
>     welcomed
>     >> <users at clusterlabs.org <mailto:users at clusterlabs.org>>; Eric
>     Robinson <eric.robinson at psmnv.com
>     <mailto:eric.robinson at psmnv.com>>; Strahil
>     >> Nikolov <hunter86_bg at yahoo.com <mailto:hunter86_bg at yahoo.com>>
>     >> Subject: Re: [ClusterLabs] Cluster Stopped, No Messages?
>     >>
>     >> Shared storage is not what triggers the need for fencing.
>     Coordinating actions
>     >> is what triggers the need. Specifically; If you can run
>     resource on both/all
>     >> nodes at the same time, you don't need HA. If you can't, you
>     need fencing.
>     >>
>     >> Digimer
>     >
>     > Thanks. That said, there is no fencing, so any thoughts on why
>     the node behaved the way it did?
>
>     Without fencing, when a communication or membership issues arises,
>     it's
>     hard to predict what will happen.
>
>     I don't see anything in the short log snippet to indicate what
>     happened.
>     What's on the other node during the event? When did the node disappear
>     and when was it rejoined, to help find relevant log entries?
>
>     Going forward, if you want predictable and reliable operation,
>     implement
>     fencing asap. Fencing is required.
>
>
>     -- 
>     Digimer
>     Papers and Projects: https://alteeve.com/w/ <https://alteeve.com/w/>
>     "I am, somehow, less interested in the weight and convolutions of
>     Einstein’s brain than in the near certainty that people of equal
>     talent
>     have lived and died in cotton fields and sweatshops." - Stephen
>     Jay Gould
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/