[ClusterLabs Developers] pacemaker POC to execute external program in case of RA timeout
kwenning at redhat.com
Mon May 31 07:12:12 EDT 2021
On 5/31/21 10:53 AM, Emil Penchev wrote:
> Hi all,
> I'm writing about an issue we have received from a pacemaker user
> about RA timeout.
> Some users have encountered a timeout from RA script/program and this
> led to a major outage for them.
> Typical of these types of cases, there is no additional useful
> information to explain why this happened.
> There is a proposed solution, a POC from the user to instrument
> pacemaker directly and insert a method to activate further debugging
> via an external callout program.
> One can set an environment variable, for example*PCMK_timeout_prog*
> that points to an external program or a script to be executed to get
> more useful debug information for example.
> Here is the proposed POC change with minor changes.
If you directly create a pull-request we would be able
to use github for discussion.
In pacemaker we already have the alerts-feature that
allows calling scripts on various occasions.
One of those is resource-actions.
So it might make sense to consider an extension of
that feature as to cover your case here as well.
Atm you would get the return-code of the RA passed
to your script. I'm actually unsure what happens in
case of a timeout.
To just be called in case of a timeout additional
filtering might be handy to reduce load generated
if the filtering is done in the script and a synchronous-call
flag (atm alerts are called more in a fire and forget
manner as not to throttle pacemaker actions)
could be useful.
> Manage your subscription:
> ClusterLabs home: https://www.clusterlabs.org/
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Developers