[ClusterLabs] Pacemaker startup retries

Wed Sep 5 15:21:23 UTC 2018

> 
> P.S. If the issue is just a matter of timing when you're starting both
> nodes, you can start corosync on both nodes first, then start pacemaker
> on both nodes. That way pacemaker on each node will immediately see the
> other node's presence.
> -- 

Well rebooting a server lasts 2 minutes approximately. 
I think I'm going to keep the same workaround I have on other servers:

-set crm stonith-timeout=300s
-have a "sleep 180" in the fencing script, so the fencing will always last 3 minutes

So when crm fences a node on startup, the fencing script will return after 3 minutes. And at that time, the other node should be up and it won't be retried fencing

What you think about this workaround?

The other solution would be updating pacemaker, but this 1.1.14 I have tested on many servers, and I don't want to take the risk to update to 1.1.15 and (maybe) have some other new issues...

Thanks a lot!
Cesar