[ClusterLabs] Antw: [EXT] Correctly stop pacemaker on 2-node cluster with SBD and failed devices?

Wed Jun 16 04:46:52 EDT 2021

On 6/16/21 3:03 PM, Andrei Borzenkov wrote:

> 
>>>
>>> We thought that access to storage was restored, but one step was
>>> missing so devices appeared empty.
>>>
>>> At this point I tried to restart the pacemaker. But as soon as I
>>> stopped pacemaker SBD rebooted nodes ‑ which is logical, as quorum was
>>> now lost.
>>>
>>> How to cleanly stop pacemaker in this case and keep nodes up?
>>
>> Unconfigurte sbd devices I guess.
>>
> 
> Do you have *practical* suggestions on how to do it online in a
> running pacemaker cluster? Can you explain how it is going to help
> given that lack of sbd device was not the problem in the first place?

I would translate this issue as "how to gracefully shutdown sbd to deregister 
sbd from pacemaker for the whole cluster". Seems no way to do that except 
`systemctl stop corosync`.

With that, to calm down sbd suicide, I'm thinking some tricky steps as below 
might help. Well, not sure it fits your situation as the whole.

crm cluster run "systemctl stop pacemaker"
crm cluster run "systemctl stop corosync"

BR,
Roger