[ClusterLabs] Single-node automated startup question
lists at alteeve.ca
Wed Apr 14 10:50:26 EDT 2021
As we get close to finish our Anvil! switch to pacemaker, I'm trying
to tie up loose ends. One that I want feedback on is the pacemaker
version of cman's old 'post_join_delay' feature.
Use case example;
A common use for the Anvil! is remote deployments where there is no
(IT) humans available. Think cargo ships, field data collection, etc. So
it's entirely possible that a node could fail and not be repaired for
weeks or even months. With this in mind, it's also feasible that a solo
node later loses power, and then reboots. In such a case, 'pcs cluster
start' would never go quorate as the peer is dead.
In cman, during startup, if there was no reply from the peer after
post_join_delay seconds, the peer would get fenced and then the cluster
would finish coming up. Being two_node, it would also become quorate and
start hosting services. Of course, this opens the risk of a fence loop,
but we have other protections in place to prevent that, so a fence loop
is not a concern.
My question then is two-fold;
1. Is there a pacemaker equivalent to 'post_join_delay'? (Fence the peer
and, if successful, become quorate)?
2. If not, was this a conscious decision not to add it for some reason,
or was it simply never added? If it was consciously decided to not have
it, what was the reasoning behind it?
I can replicate this behaviour in our code, but I don't want to do
that if there is a compelling reason that I am not aware of.
A) is there a pacemaker version of post_join_delay?
B) is there a compelling argument NOT to use post_join_delay behaviour
in pacemaker I am not seeing?
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
More information about the Users