[ClusterLabs] Antw: Re: Antw: Re: Pacemaker kill does not cause node fault ???

Ferenc Wágner wferi at niif.hu
Wed Feb 8 03:49:20 EST 2017

Ken Gaillot <kgaillot at redhat.com> writes:

> On 02/07/2017 01:11 AM, Ulrich Windl wrote:
>> Ken Gaillot <kgaillot at redhat.com> writes:
>>> On 02/06/2017 03:28 AM, Ulrich Windl wrote:
>>>> Isn't the question: Is crmd a process that is expected to die (and
>>>> thus need restarting)? Or wouldn't one prefer to debug this
>>>> situation. I fear that restarting it might just cover some fatal
>>>> failure...
>>> If crmd or corosync dies, the node will be fenced (if fencing is enabled
>>> and working). If one of the crmd's persistent connections (such as to
>>> the cib) fails, it will exit, so it ends up the same.
>> But isn't it due to crmd not responding to network packets? So if the
>> timeout is long enough, and crmd is started fast enough, will the
>> node really be fenced?
> If crmd dies, it leaves its corosync process group, and I'm pretty sure
> the other nodes will fence it for that reason, regardless of the duration.

See http://lists.clusterlabs.org/pipermail/users/2016-March/002415.html
for a case when a Pacemaker cluster survived a crmd failure and restart.
Re-reading the thread, I'm still unsure what saved our ass from
resources being started in parallel and losing massive data.  I'd fully
expect fencing in such cases...

More information about the Users mailing list