[ClusterLabs] Is corosync supposed to be restarted if it fies?

Andrei Borzenkov arvidjaar at gmail.com
Tue Nov 28 14:35:41 EST 2017


28.11.2017 13:01, Jan Pokorný пишет:
> On 27/11/17 17:43 +0300, Andrei Borzenkov wrote:
>> Отправлено с iPhone
>>
>>> 27 нояб. 2017 г., в 14:36, Ferenc Wágner <wferi at niif.hu> написал(а):
>>>
>>> Andrei Borzenkov <arvidjaar at gmail.com> writes:
>>>
>>>> 25.11.2017 10:05, Andrei Borzenkov пишет:
>>>>
>>>>> In one of guides suggested procedure to simulate split brain was to kill
>>>>> corosync process. It actually worked on one cluster, but on another
>>>>> corosync process was restarted after being killed without cluster
>>>>> noticing anything. Except after several attempts pacemaker died with
>>>>> stopping resources ... :)
>>>>>
>>>>> This is SLES12 SP2; I do not see any Restart in service definition so it
>>>>> probably not systemd.
>>>>>
>>>> FTR - it was not corosync, but pacemakker; its unit file specifies
>>>> RestartOn=error so killing corosync caused pacemaker to fail and be
>>>> restarted by systemd.
>>>
>>> And starting corosync via a Requires dependency?
>>
>> Exactly.
> 
> From my testing it looks like we should change
> "Requires=corosync.service" to "BindsTo=corosync.service"
> in pacemaker.service.
> 
> Could you give it a try?
> 

I'm not sure what is expected outcome, but pacemaker.service is still
restarted (due to Restart=on-failure). If intention is to
unconditionally stop it when corosync dies, pacemaker should probably
exit with unique code and unit files have RestartPreventExitStatus set
to it.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20171128/c98fa833/attachment-0003.sig>


More information about the Users mailing list