[ClusterLabs] corosync.service (and sbd.service) are not stopper on pacemaker shutdown when corosync-qdevice is used

Fri Aug 9 04:19:28 EDT 2019

On 8/9/19 3:39 PM, Jan Friesse wrote:
> Roger Zhou napsal(a):
>>
>> On 8/9/19 2:27 PM, Roger Zhou wrote:
>>>
>>> On 7/29/19 12:24 AM, Andrei Borzenkov wrote:
>>>> corosync.service sets StopWhenUnneded=yes which normally stops it when
>>>> pacemaker is shut down.
>>
>> One more thought,
>>
>> Make sense to add "RefuseManualStop=true" to pacemaker.service?
>> The same for corosync-qdevice.service?
>>
>> And "RefuseManualStart=true" to corosync.service?
> 
> I would say short answer is no, but I would like to hear what is the 
> main idea for this proposal.

It's more about out of box user experience to guide the users of the 
most use cases in the field to manage the whole cluster stack in the 
appropriate steps, namely:

- To start stack: systemctl start pacemaker corosync-qdevice
- To stop stack: systemctl stop corosync.service

and less error prone assumptions:

With "RefuseManualStop=true" to pacemaker.service, sometimes(if not often),

- it prevents the wrong assumption/wish/impression to stop the
   whole cluster together with corosync

- it prevents users forget one more step to stop corosync indeed

- it prevents some ISV do create disruptive scripts only stop pacemaker 
and forget others.

- Being rejected at the first place, then naturally guide users to run 
`systemctl stop corosync.service`

And extends the same idea a little further to

- "RefuseManualStop=true" to corosync-qdevice.service
- and "RefuseManualStart=true" to corosync.service

Well, I do feel corosync* are less error prone as pacemaker in this regards.

Thanks,
Roger

> 
> Regards,
>    Honza
> 
>>
>> @Jan, @Ken
>>
>> What do you think?
>>
>> Cheers,
>> Roger
>>
>>
>>>
>>> `systemctl stop corosync.service` is the right command to stop those
>>> cluster stack.
>>>
>>> It stops pacemaker and corosync-qdevice first, and stop SBD too.
>>>
>>> pacemaker.service: After=corosync.service
>>> corosync-qdevice.service: After=corosync.service
>>> sbd.service: PartOf=corosync.service
>>>
>>> On the reverse side, to start the cluster stack, use
>>>
>>> systemctl start pacemaker.service corosync-qdevice
>>>
>>> It is slightly confusing from the impression. So, openSUSE uses the
>>> consistent commands as below:
>>>
>>> crm cluster start
>>> crm cluster stop
>>>
>>> Cheers,
>>> Roger
>>>
>>>> Unfortunately, corosync-qdevice.service declares
>>>> Requires=corosync.service and corosync-qdevice.service itself is *not*
>>>> stopped when pacemaker.service is stopped. Which means corosync.service
>>>> remains "needed" and is never stopped.
>>>>
>>>> Also sbd.service (which is PartOf=corosync.service) remains running 
>>>> as well.
>>>>
>>>> The latter is really bad, as it means sbd watchdog can kick in at any
>>>> time when user believes cluster stack is safely stopped. In particular
>>>> if qnetd is not accessible (think network reconfiguration).
>>>> _______________________________________________
>>>> Manage your subscription:
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
>