[ClusterLabs] Pacemaker startup retries
Ken Gaillot
kgaillot at redhat.com
Wed Sep 5 12:13:41 EDT 2018
On Wed, 2018-09-05 at 17:21 +0200, Cesar Hernandez wrote:
> >
> > P.S. If the issue is just a matter of timing when you're starting
> > both
> > nodes, you can start corosync on both nodes first, then start
> > pacemaker
> > on both nodes. That way pacemaker on each node will immediately see
> > the
> > other node's presence.
> > --
>
> Well rebooting a server lasts 2 minutes approximately.
> I think I'm going to keep the same workaround I have on other
> servers:
>
> -set crm stonith-timeout=300s
> -have a "sleep 180" in the fencing script, so the fencing will always
> last 3 minutes
>
> So when crm fences a node on startup, the fencing script will return
> after 3 minutes. And at that time, the other node should be up and it
> won't be retried fencing
>
> What you think about this workaround?
>
>
> The other solution would be updating pacemaker, but this 1.1.14 I
> have tested on many servers, and I don't want to take the risk to
> update to 1.1.15 and (maybe) have some other new issues...
>
> Thanks a lot!
> Cesar
If you build from source, you can apply the patch that fixes the issue
to the 1.1.14 code base:
https://github.com/ClusterLabs/pacemaker/commit/98457d1635db1222f93599b6021e662e766ce62d
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list