[ClusterLabs] Single-node automated startup question

Andrei Borzenkov arvidjaar at gmail.com
Wed Apr 14 11:00:24 EDT 2021


On 14.04.2021 17:50, Digimer wrote:
> Hi all,
> 
>   As we get close to finish our Anvil! switch to pacemaker, I'm trying
> to tie up loose ends. One that I want feedback on is the pacemaker
> version of cman's old 'post_join_delay' feature.
> 
> Use case example;
> 
>   A common use for the Anvil! is remote deployments where there is no
> (IT) humans available. Think cargo ships, field data collection, etc. So
> it's entirely possible that a node could fail and not be repaired for
> weeks or even months. With this in mind, it's also feasible that a solo
> node later loses power, and then reboots. In such a case, 'pcs cluster
> start' would never go quorate as the peer is dead.
> 
>   In cman, during startup, if there was no reply from the peer after
> post_join_delay seconds, the peer would get fenced and then the cluster
> would finish coming up. Being two_node, it would also become quorate and
> start hosting services. Of course, this opens the risk of a fence loop,
> but we have other protections in place to prevent that, so a fence loop
> is not a concern.
> 
>   My question then is two-fold;
> 
> 1. Is there a pacemaker equivalent to 'post_join_delay'? (Fence the peer
> and, if successful, become quorate)?
> 

Startup fencing is pacemaker default (startup-fencing cluster option).

> 2. If not, was this a conscious decision not to add it for some reason,
> or was it simply never added? If it was consciously decided to not have
> it, what was the reasoning behind it?
> 
>   I can replicate this behaviour in our code, but I don't want to do
> that if there is a compelling reason that I am not aware of.
> 
> So,
> 
> A) is there a pacemaker version of post_join_delay?
> B) is there a compelling argument NOT to use post_join_delay behaviour
> in pacemaker I am not seeing?
> 
> Thanks!
> 



More information about the Users mailing list