[ClusterLabs] Single-node automated startup question

Wed Apr 14 10:50:26 EDT 2021

Hi all,

  As we get close to finish our Anvil! switch to pacemaker, I'm trying
to tie up loose ends. One that I want feedback on is the pacemaker
version of cman's old 'post_join_delay' feature.

Use case example;

  A common use for the Anvil! is remote deployments where there is no
(IT) humans available. Think cargo ships, field data collection, etc. So
it's entirely possible that a node could fail and not be repaired for
weeks or even months. With this in mind, it's also feasible that a solo
node later loses power, and then reboots. In such a case, 'pcs cluster
start' would never go quorate as the peer is dead.

  In cman, during startup, if there was no reply from the peer after
post_join_delay seconds, the peer would get fenced and then the cluster
would finish coming up. Being two_node, it would also become quorate and
start hosting services. Of course, this opens the risk of a fence loop,
but we have other protections in place to prevent that, so a fence loop
is not a concern.

  My question then is two-fold;

1. Is there a pacemaker equivalent to 'post_join_delay'? (Fence the peer
and, if successful, become quorate)?

2. If not, was this a conscious decision not to add it for some reason,
or was it simply never added? If it was consciously decided to not have
it, what was the reasoning behind it?

  I can replicate this behaviour in our code, but I don't want to do
that if there is a compelling reason that I am not aware of.

So,

A) is there a pacemaker version of post_join_delay?
B) is there a compelling argument NOT to use post_join_delay behaviour
in pacemaker I am not seeing?

Thanks!

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould