[Pacemaker] Enable remote monitoring

Gao,Yan ygao at suse.com
Thu Jan 24 03:05:40 EST 2013

Hi David,
Thanks for the comments!

On 01/24/13 00:36, David Vossel wrote:
> ----- Original Message -----
>> From: "Yan Gao" <ygao at suse.com>
>> To: pacemaker at oss.clusterlabs.org
>> Sent: Monday, January 21, 2013 11:28:40 PM
>> Subject: Re: [Pacemaker] Enable remote monitoring
>> Hi,
>> Here's the code for supporting nagios plugins in lrmd:
>> https://github.com/gao-yan/pacemaker/commits/nagios
>> A new resource class "nagios" is introduced.
>> Actions:
>> - probe: A resource defined for a resource container is not probed.
>> (We
>> can also add a condition in pengine to just avoid probing a nagios
>> class
>> resource.)
> Yeah, I think the pengine should know to never probe a nagios script regardless if it is involved in a container or not.
>> - start: Invokes the nagios plugin with specified parameters (Maps
>> the
>> instance attributes to the long options of the nagios plugin). If it
>> returns non-OK, re-invokes it after some delay (delay = start_timeout
>> /
>> 10),  until it returns OK or exceeds the start timeout.
> I made a comment about this on the patch.  Shouldn't the cmd->timeout value be updated each time it is re-scheduled to account for time already spent?
Ah, you are right! Changed, still in

>> - monitor: Recurring invocation to the nagios plugin with specified
>> parameters.
>> - stop: Nothing special is done. The recurring monitor is canceled
>> anyway.
>> - metadata: Reads the corresponding metadata from a xml file in
>> (As we know nagios plugins don't support metadata. The current plan
>> is
>> to generate the corresponding metadata according to the help of the
>> plugins, and put them into NAGIOS_METADATA_DIR for use -- Dejan
>> already
>> has progress on this. Thank, Dejan!)
>> For nagios plugins, the exit code are:
>> STATE_OK        = 0,
>> AFAICS, STATE_OK should map to PCMK_EXECRA_OK, and the others should
>> all
>> belong to PCMK_EXECRA_UNKNOWN_ERROR. Well, apparently, there's no
>> code
>> to express "NOT_RUNNING" in nagios plugins. I think it should be
>>  fine,
>> since there's no probe.
>> Any suggestions are appreciated!
> This mostly looks like what I expected.  I'm letting the whole re-scheduling of the start operation roll around in my head a bit.  It almost seems like that functionality belongs in the service library...  retry executing this action until either the timeout is hit or some target return code is encountered.  Any thoughts on that?
The handling mainly focuses on a "lrmd_cmd_t" -- resetting some of its
variables, adding it to the resource's pending operations and
triggering. It seems not necessary to put it in service library.

Gao,Yan <ygao at suse.com>
Software Engineer
China Server Team, SUSE.

More information about the Pacemaker mailing list