[ClusterLabs] notify action asynchronous ? (was: why and when a call of crm_attribute can be delayed ?)
Jehan-Guillaume de Rorthais
jgdr at dalibo.com
Thu May 12 09:37:29 UTC 2016
Le Sun, 8 May 2016 16:35:25 +0200,
Jehan-Guillaume de Rorthais <jgdr at dalibo.com> a écrit :
> Le Sat, 7 May 2016 00:27:04 +0200,
> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> a écrit :
>
> > Le Wed, 4 May 2016 09:55:34 -0500,
> > Ken Gaillot <kgaillot at redhat.com> a écrit :
> ...
> > > There would be no point in the pre-promote notify waiting for the
> > > attribute value to be retrievable, because the cluster isn't going to
> > > wait for the pre-promote notify to finish before calling promote.
> >
> > Oh, this is surprising. I thought the pseudo action
> > "*_confirmed-pre_notify_demote_0" in the transition graph was a wait for
> > each resource clone return code before going on with the transition. The
> > graph is confusing, if the cluster isn't going to wait for the pre-promote
> > notify to finish before calling promote, I suppose some arrows should point
> > directly from start (or post-start-notify?) action directly to the promote
> > action then, isn't it?
> >
> > This is quite worrying as our RA rely a lot on notifications. As instance,
> > we try to recover a PostgreSQL instance during pre-start or pre-demote if we
> > detect a recover action...
>
> I'm coming back on this point.
>
> Looking at this documentation page:
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-config-testing-changes.html
>
> I can read "Arrows indicate ordering dependencies".
>
> Looking at the transition graph I am studying (see attachment, a simple
> master resource move), I still don't understand how the cluster isn't going to
> wait for a pre-promote notify to finish before calling promote.
>
> So either I misunderstood your words or I miss something else important, which
> is quite possible as I am fairly new to this word. Anyway, I try to make a
> RA as robust as possible and any lights/docs are welcome!
I tried to trigger this potential asynchronous behavior of the notify action,
but couldn't observe it.
I added different sleep period in the notify action for each node of my cluster:
* 10s for hanode1
* 15s for hanode2
* 20s for hanode3
The master was on hanode1 and the DC was hanode1. While moving the master
resource to hanode2, I can see in the log files that the DC is always
waiting for the rc of hanode3 before triggering the next action in the
transition.
So, **in pratice**, it seems the notify action is synchronous. In theory now, I
still wonder if I misunderstood your words...
Regards,
More information about the Users
mailing list