[ClusterLabs] pacemaker alerts list

Thu Jul 18 09:25:20 EDT 2019

On 17/07/19 19:07 +0000, Gershman, Vladimir wrote:
> This would be for the Pacemaker.
> 
> Seems like the alerts in the link you sent, refer to numeric codes,
> so where would I see all the codes and their meanings ?  This would
> allow a way to select what I need to monitor. 

Unfortunately, we currently won't make do without direct code
references, but keep in mind we are in the expert area already
when the alerts usefulness is to be maxed out (and one is then
also responsible to update such a tight wrapping accordingly if
or when incompatible changes arrive in new versions -- presumably
any slightly more disruptive releases would designate that in the
versioning scheme [non-minor component being incremented] and
perhaps it'd be noted in the release notes as well).

That being said, more detailed documentation and perhaps accompanied
with firmer assurances as to the details of so far vaguely specified
informative data items attached to the "unit" of alert may arrive
in the future, and I bet contributions to make it happen faster
are warmly welcome, especially when driven by the real production
needs.

> For example:

[intentionally reordered]

> CRM_alert_status:
>   A numerical code used by Pacemaker to represent the operation
>   result (resource alerts only)  

See
https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.2/include/crm/services.h#L118-L129

> CRM_alert_desc:
>   Detail about event. For node alerts, this is the node's current
>   state (member or lost).

That's literal, see
https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.2/include/crm/cluster.h#L30-L31

>   For fencing alerts, this is a summary of the requested fencing
>   operation, including origin, target, and fencing operation error
>   code, if any.

This would indeed require extensive parsing of the generated string
for fields that are not present as standalone variables (here, node
to be fenced that is also available separately via CRM_alert_node):

https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.2/daemons/controld/controld_execd_state.c#L805-L809

>   For resource alerts, this is a readable string equivalent of
>   CRM_alert_status.  

See the first link above, translation from numeric codes is rather
symbolic, though:

https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.2/include/crm/services.h#L331-L340

(but may denote that some codes from the full enumeration are strictly
internal, based on a simple reasoning about the coverage, not sure)

Plus there's an exception for operations already known finished, for
which exit status from the actual agent's execution is reproduced here
in words, and luckily, that's actually documented:

https://github.com/ClusterLabs/resource-agents/blob/v4.3.0/doc/dev-guides/ra-dev-guide.asc#return-codes

> CRM_alert_target_rc:
>   The expected numerical return code of the operation (resource
>   alerts only)  

This appears to be primarily bound to OCF codes referred just above.

* * *

Hopefully that's enough to get you started with your own exploration.
Initially, I'd also suggest attaching your own dump-all alert handler
to get the real hands-on with the data at your disposal that can be
leveraged in your true handler.

-- 
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190718/658106c5/attachment.sig>